The _______ test compares the means of two independent groups.
- Chi-square
- Independent t
- Paired t
- Z
An Independent t-test (or two sample t-test) compares the means of two independent groups.
The optimal number of clusters in K-means clustering is often determined using the ________ method.
- elbow
- foot
- hand
- knee
The optimal number of clusters in K-means clustering is often determined using the elbow method. This involves plotting the explained variation as a function of the number of clusters and picking the elbow of the curve as the number of clusters to use.
What is the probability of an event that is certain to happen?
- 0
- 0.5
- 1
- The probability is undefined for certain events
The probability of an event that is certain to happen is 1. This is based on the definition of probability as a measure that takes values between 0 and 1, inclusive. An event with a probability of 1 is a sure event.
What techniques can be used to detect multicollinearity in a multiple regression model?
- Analysis of Variance (ANOVA)
- Chi-square test
- T-test
- Variance Inflation Factor (VIF)
The Variance Inflation Factor (VIF) is commonly used to detect multicollinearity in regression analysis.
How does the variability of the population affect the width of a confidence interval?
- Higher variability decreases the width of the confidence interval
- Higher variability increases the width of the confidence interval
- The relationship between variability and the width of the confidence interval is unpredictable
- Variability has no effect on the width of the confidence interval
Higher variability in the population increases the width of the confidence interval. When data points are spread out more (higher variability), there is more uncertainty about where the true population parameter lies, leading to a larger standard error and thus a wider confidence interval.
___________ occurs when changes in one variable are associated with changes in another variable, but one does not necessarily cause the other.
- Causation
- Correlation
- Covariation
- Regression
Correlation occurs when changes in one variable are associated with changes in another variable. It's important to remember that correlation does not imply causation. Just because two variables move together, it does not mean that one variable's movement is causing the other's.
How does the Breusch-Pagan test check for heteroscedasticity in residuals?
- By comparing the variance of residuals
- By examining the correlation of residuals
- By plotting residuals against fitted values
- By regressing the squared residuals on the predictors
The Breusch-Pagan test checks for heteroscedasticity by regressing the squared residuals on the predictors. If the predictors explain a significant amount of variance in the squared residuals, the test concludes that heteroscedasticity is present.
What is the purpose of the 'whiskers' in a box plot?
- To represent the outliers
- To represent the range of the data
- To show the interquartile range
- To show the mean and median
The 'whiskers' in a box plot represent the range of the data. The upper whisker extends to the maximum data value or up to 1.5 times the interquartile range (IQR), while the lower whisker extends to the minimum data value or up to 1.5 times the IQR. Any data points beyond the whiskers can be considered outliers.
Why is the assumption of independently and identically distributed (IID) residuals important in linear regression?
- It ensures that the model is not overfitting
- It ensures that the model is not underfitting
- It ensures that the parameter estimates are unbiased
- It ensures the correctness of standard errors and hypothesis tests
The assumption of IID residuals is important because it ensures that standard errors, confidence intervals, and hypothesis tests are valid. If this assumption is violated, these statistics may be incorrect, leading to misleading results.
When would you prefer to use the median instead of the mean as a measure of central tendency?
- When the data has outliers
- When the data is in large quantity
- When the data is normally distributed
- When the data is uniformly distributed
The median is preferred over the mean when our data is skewed or has outliers. Outliers can greatly affect the mean and create a distorted view of the data, but the median is not affected by outliers or skewed data. The median is the middle score for a set of data that has been arranged in order of magnitude, making it a better measure when dealing with skewed distributions.
What is the key difference between a discrete and a continuous random variable?
- Discrete variables are predictable, continuous variables are not
- Discrete variables can only take on a countable number of values, continuous variables can take on any value within a certain range
- Discrete variables can take on any value, continuous variables can take on only integer values
- There's no difference between discrete and continuous random variables
Discrete random variables are variables that can only take on a countable number of values, such as integers, while continuous random variables can take on any value within a certain range or interval.
A distribution with a positive ________ has a long tail in the positive direction.
- Kurtosis
- Mean
- Median
- Skewness
A distribution with positive skewness is said to be positively skewed or right-skewed, which means it has a long tail in the positive direction on the number line.