A Chi-square test for independence is used to determine if there is a significant relationship between two ________ variables.
- categorical
- continuous
- nominal
- ordinal
A Chi-square test for independence is used to determine if there is a significant relationship between two categorical variables. It is not applicable for continuous, ordinal, or nominal variables.
A probability must be a number between ________ and ________.
- #NAME?
- -1, 1
- 0, 1
- 1, 100
By definition, the probability of an event is a number between 0 and 1. A probability of 0 means the event will never occur, and a probability of 1 means the event is certain to occur.
What is the effect of having small expected frequencies in a Chi-square test?
- It does not affect the test
- It increases the power of the test
- It invalidates the test
- It reduces the power of the test
In a Chi-square test, having small expected frequencies can reduce the power of the test and potentially lead to erroneous conclusions. This is because the Chi-square test is based on the assumption that the expected frequency of each category is at least 5.
What type of correlation does the Spearman's Rank Correlation test measure?
- Correlation of variances
- Linear correlation
- Monotonic correlation
- Polynomial correlation
Spearman's Rank Correlation test measures monotonic correlation, which indicates whether an increase in one variable will increase or decrease the other variable. It does not require the relationship between the variables to be linear.
The _______ of a confidence interval corresponds to the total area under the curve that is excluded on both sides of the curve.
- Confidence level
- Margin of error
- Population parameter
- Standard error
The margin of error of a confidence interval corresponds to the total area under the curve that is excluded on both sides of the curve. This margin of error determines the width of the confidence interval.
What happens if the assumption of homoscedasticity is violated in simple linear regression?
- It has no effect on the regression model
- It makes the regression model more accurate
- It makes the regression model perfectly fit the data
- It makes the standard errors and confidence intervals invalid
Homoscedasticity is the assumption that the variance of the residuals is constant across all levels of the independent variable. If this assumption is violated (a condition known as heteroscedasticity), it can lead to unreliable and inefficient estimates of the standard errors. This, in turn, can make the confidence intervals and hypothesis tests invalid.
In the context of a continuous random variable, the ________ function gives the probability that the variable takes a value less than or equal to a certain value.
- Cumulative Distribution Function
- Probability Density Function
- Probability Mass Function
- Random Function
The Cumulative Distribution Function (CDF) of a random variable is defined as the probability that the variable takes a value less than or equal to a certain value. The difference between discrete and continuous random variables is the way their probabilities are assigned.
The ________ is the most frequent value in a data set.
- Mean
- Median
- Mode
- nan
The mode is the value that appears most frequently in a data set. A set of data may have one mode, more than one mode, or no mode at all.
What is a uniform distribution?
- A distribution in which all outcomes are equally likely
- A distribution in which all outcomes follow a linear pattern
- A distribution where outcomes are not related
- A distribution where outcomes have a bell-shaped pattern
A uniform distribution, sometimes also known as a rectangular distribution, is a distribution that has constant probability. This distribution is often used in the cases where all outcomes are equally likely.
How is the 'mean' calculated for a data set?
- By arranging the values in ascending order
- By finding the middle value
- By finding the most frequent value
- By summing all values and dividing by the number of values
The mean of a data set is calculated by summing all the values and then dividing by the number of values. It gives the 'average' of the data and can be used for both discrete and continuous data sets. However, it can be heavily influenced by outliers or extreme values.