In a normal distribution, about 95% of the data lies within _______ standard deviations of the mean.

  • Four
  • One
  • Three
  • Two
According to the empirical rule (also known as the 68-95-99.7 rule), in a normal distribution, about 68% of the data lies within one standard deviation of the mean, about 95% lies within two standard deviations, and about 99.7% lies within three standard deviations.

How do you diagnose multicollinearity in a multiple linear regression model?

  • By calculating the R-squared value
  • By checking the correlation matrix and Variance Inflation Factor (VIF)
  • By looking at the residual plot
  • By looking at the scatter plot
Multicollinearity is diagnosed in a multiple linear regression model by checking the correlation matrix and the Variance Inflation Factor (VIF). A high correlation between independent variables and a VIF greater than 5 or 10 suggests the presence of multicollinearity.

How can transformations help in reducing skewness in a dataset?

  • They can make the distribution more symmetric
  • They can shift the mean towards the skew
  • They can shift the mode towards the skew
  • Transformations cannot reduce skewness
Transformations, such as logarithmic or square root transformations, can help in reducing skewness by making the distribution more symmetric. The choice of transformation often depends on the degree and direction of skewness.

How does the standard deviation affect the shape of a normal distribution?

  • Changes the kurtosis
  • Changes the skewness
  • Changes the spread or dispersion
  • Does not affect the shape
The standard deviation, a measure of dispersion or spread, determines the width of a normal distribution. A larger standard deviation results in a wider, flatter distribution, while a smaller standard deviation results in a narrower, steeper distribution.

A _______ t-test is used to compare two related samples or repeated measurements on a single sample.

  • Independent
  • One-sample
  • Paired
  • Two-sample
A Paired t-test is used to compare two related samples or repeated measurements on a single sample. It's often used in before-and-after scenarios where the same individuals are measured twice.

What is a random variable in probability theory?

  • A factor that doesn't change
  • A variable that can take on different values, each with an associated probability
  • An unknown variable
  • An unpredictable factor
A random variable in probability theory is a variable that can take on different values, each with an associated probability. It's not "random" in the everyday sense of the word, but its exact value is uncertain until it's observed.

The _________ test is a non-parametric test that compares the medians of two paired groups.

  • Chi-square
  • Mann-Whitney U
  • Sign
  • Wilcoxon Signed Rank
The Wilcoxon Signed Rank test is a non-parametric test that compares the medians of two paired groups.

How does the effect size relate to the power of a t-test?

  • Effect size has no relation to the power of a test
  • Larger effect sizes are associated with higher power
  • Larger effect sizes are associated with lower power
  • nan
The effect size is the magnitude of the difference between groups. Larger effect sizes are easier to detect and are associated with higher power in a t-test.

What is the value of the probability of an impossible event?

  • 0
  • 0.5
  • 1
  • The probability is undefined
By definition, the probability of an impossible event is 0. This is because the measure of probability assigns 0 to events that cannot occur and 1 to events that are certain to occur.

_________ is a condition in which the error term in a regression model is correlated with itself.

  • Autocorrelation
  • Homoscedasticity
  • Multicollinearity
  • Underfitting
Autocorrelation, also known as serial correlation, is the correlation of a signal with a delayed copy of itself as a function of delay. In the context of a regression analysis, it refers to the condition when the error term (residuals) in a regression model is correlated with itself.