How can you detect multicollinearity in multiple linear regression?
- By checking the correlation among predictors
- By checking the normality of residuals
- By looking at the scatter plot of residuals
- By using the F-test
Multicollinearity can be detected by examining the correlations among the predictors. High correlation among the predictors indicates the presence of multicollinearity. More formal methods such as the Variance Inflation Factor (VIF) can also be used.
In what situations is Spearman's rank correlation preferred over Pearson's correlation?
- All of the above
- When the data contains outliers
- When the relationship between variables is nonlinear
- When the variables are not normally distributed
Spearman's rank correlation coefficient is a nonparametric measure of rank correlation. It's preferred over Pearson's correlation when the variables are not normally distributed, the relationship is nonlinear, or the data contains outliers. It assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables.
What is the significance of 'distance measures' in cluster analysis?
- Distance measures determine the similarities or differences between data points
- Distance measures help in determining the shape of the clusters
- Distance measures help in visualizing the clusters
- Distance measures indicate the number of clusters
Distance measures, like Euclidean distance or Manhattan distance, play a crucial role in cluster analysis. They determine the similarities or differences between data points. They influence how the clusters will be formed, as the most similar or closest data points get clustered together.
A 95% confidence interval means that if the same sampling method is repeated many times, then ________% of the confidence intervals will contain the true population parameter.
- 50
- 75
- 90
- 95
A 95% confidence interval means that if we were to take a large number of samples and calculate the confidence interval for each sample, we would expect the true population parameter to fall within the interval 95% of the time.
The ________ distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space.
- Exponential
- Gaussian
- Poisson
- Uniform
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.
A __________ is the difference between the observed value and the predicted value of the response variable in regression analysis.
- Coefficient
- Error term
- Outlier
- Residual
In the context of regression analysis, the difference between the observed value and the predicted value of the response variable is called a "residual".
How do changes in the scale of measurement affect the correlation coefficient?
- They decrease the correlation coefficient
- They do not affect the correlation coefficient
- They increase the correlation coefficient
- They reverse the sign of the correlation coefficient
The correlation coefficient is not affected by changes in the center (mean) or scale (standard deviation) of the variables. This is because correlation measures the strength of a relationship between variables relative to their variability. It's a dimensionless quantity, so changes in the scale of measurements of the variables do not change it.
What is a residual in the context of simple linear regression?
- The difference between the observed and predicted values
- The difference between the predicted and observed values of the independent variable
- The error in the slope of the regression line
- The observed value of the dependent variable
A residual is the difference between the observed value of the dependent variable (y) and the predicted value (ŷ), given by the regression model. It represents the error of the estimate.
What is the null hypothesis in a Chi-square test for independence?
- The population means are equal
- The population variances are equal
- There is an association between the variables
- There is no association between the variables
The null hypothesis in a Chi-square test for independence states that there is no association between the variables - they are independent.
What assumption does the Chi-square test for goodness of fit make about the observations?
- The observations are correlated
- The observations are independent
- The observations are normally distributed
- The observations are paired
The Chi-square test for goodness of fit assumes that the observations are independent, which means that the outcome of one observation does not affect the outcome of another.
A histogram with two peaks is known as a ________ distribution.
- Bimodal
- Multimodal
- Normal
- Uniform
A histogram with two distinct peaks is referred to as a bimodal distribution. This might suggest that the data contains two different groups, each with their own mode, or most common value.
In Bayes' theorem, what does the prior probability represent?
- The likelihood of the evidence
- The probability of an event before evidence is observed
- The probability of the evidence given the event
- The updated probability after evidence is observed
The prior probability in Bayes' Theorem is the initial or original probability of an event before new evidence is taken into account. It represents our initial belief about the event.