How can transformations help in reducing skewness in a dataset?

  • They can make the distribution more symmetric
  • They can shift the mean towards the skew
  • They can shift the mode towards the skew
  • Transformations cannot reduce skewness
Transformations, such as logarithmic or square root transformations, can help in reducing skewness by making the distribution more symmetric. The choice of transformation often depends on the degree and direction of skewness.

How do you diagnose multicollinearity in a multiple linear regression model?

  • By calculating the R-squared value
  • By checking the correlation matrix and Variance Inflation Factor (VIF)
  • By looking at the residual plot
  • By looking at the scatter plot
Multicollinearity is diagnosed in a multiple linear regression model by checking the correlation matrix and the Variance Inflation Factor (VIF). A high correlation between independent variables and a VIF greater than 5 or 10 suggests the presence of multicollinearity.

In a normal distribution, about 95% of the data lies within _______ standard deviations of the mean.

  • Four
  • One
  • Three
  • Two
According to the empirical rule (also known as the 68-95-99.7 rule), in a normal distribution, about 68% of the data lies within one standard deviation of the mean, about 95% lies within two standard deviations, and about 99.7% lies within three standard deviations.

What does a Pearson Correlation Coefficient of 0 indicate?

  • No correlation
  • Perfect negative correlation
  • Perfect positive correlation
  • Weak positive correlation
A Pearson correlation coefficient of 0 indicates no correlation. This means that the variables are independent and there is no linear relationship between them.

If the null hypothesis is true in ANOVA, the F-statistic follows a ________ distribution.

  • Binomial
  • Chi-Square
  • F
  • Normal
In ANOVA, if the null hypothesis is true, the F-statistic follows an F-distribution. The F-distribution is a probability distribution that is used most commonly in Analysis of Variance.

Spearman's Rank Correlation is especially useful when the relationship between variables is ________, but not necessarily linear.

  • Bimodal
  • Monotonic
  • Negative
  • Positive
Spearman's Rank Correlation is especially useful when the relationship between variables is monotonic, but not necessarily linear. A monotonic relationship is one where the variables tend to change together, but not necessarily at a constant rate.

Why is interval estimation generally preferred over point estimation?

  • Because it gives more accurate results
  • Because it is easier to calculate
  • Because it is less affected by outliers
  • Because it provides a range of possible values rather than a single point
Interval estimation is generally preferred over point estimation because it provides a range of possible values rather than a single value. This range of values gives a better understanding of the uncertainty around the estimate, hence, it provides more information than a single point estimate.

The _________ test is a non-parametric test that compares the medians of two paired groups.

  • Chi-square
  • Mann-Whitney U
  • Sign
  • Wilcoxon Signed Rank
The Wilcoxon Signed Rank test is a non-parametric test that compares the medians of two paired groups.

What is the support of a continuous random variable?

  • The highest and lowest value of the variable
  • The mean value of the distribution
  • The set of values that have non-zero probability
  • The variance of the distribution
The support of a random variable is the set of values in the range of the variable that have non-zero probability. For a continuous random variable, it's the set of values over which the probability density function is non-zero.

What is the role of the 'R-squared' value in a multiple linear regression model?

  • It represents the correlation between the dependent and independent variables
  • It represents the error term in the regression model
  • It represents the proportion of variance in the dependent variable that is predictable from the independent variables
  • It represents the total variance in the dependent variable
The 'R-squared' value, also known as the coefficient of determination, in a multiple linear regression model represents the proportion of variance in the dependent variable that can be predicted from the independent variables. It ranges from 0 to 1, where a higher value indicates a better fit of the model.

What is the difference between excess kurtosis and kurtosis?

  • Excess kurtosis is always greater than kurtosis
  • Excess kurtosis is always less than kurtosis
  • Excess kurtosis is kurtosis minus 3
  • There is no difference between excess kurtosis and kurtosis
The difference between kurtosis and excess kurtosis comes down to a constant. Excess kurtosis is simply kurtosis minus 3. The "3" comes from the kurtosis of a normal distribution which is 3. Hence, excess kurtosis refers to kurtosis in relation to a normal distribution.

How is the F-statistic calculated in an ANOVA test?

  • It is the difference between between-group variance and within-group variance
  • It is the ratio of between-group variance to within-group variance
  • It is the ratio of within-group variance to between-group variance
  • It is the sum of between-group variance and within-group variance
In an ANOVA test, the F-statistic is calculated as the ratio of the between-group variance (mean sum of squares between groups) to the within-group variance (mean sum of squares within groups). A larger F-statistic implies a greater degree of difference between the group means.