In a normal distribution, about 95% of the data lies within _______ standard deviations of the mean.

Four
One
Three
Two

According to the empirical rule (also known as the 68-95-99.7 rule), in a normal distribution, about 68% of the data lies within one standard deviation of the mean, about 95% lies within two standard deviations, and about 99.7% lies within three standard deviations.

Discuss it

How do you diagnose multicollinearity in a multiple linear regression model?

By calculating the R-squared value
By checking the correlation matrix and Variance Inflation Factor (VIF)
By looking at the residual plot
By looking at the scatter plot

Multicollinearity is diagnosed in a multiple linear regression model by checking the correlation matrix and the Variance Inflation Factor (VIF). A high correlation between independent variables and a VIF greater than 5 or 10 suggests the presence of multicollinearity.

Discuss it

How can transformations help in reducing skewness in a dataset?

They can make the distribution more symmetric
They can shift the mean towards the skew
They can shift the mode towards the skew
Transformations cannot reduce skewness

Transformations, such as logarithmic or square root transformations, can help in reducing skewness by making the distribution more symmetric. The choice of transformation often depends on the degree and direction of skewness.

Discuss it

How does the standard deviation affect the shape of a normal distribution?

Changes the kurtosis
Changes the skewness
Changes the spread or dispersion
Does not affect the shape

The standard deviation, a measure of dispersion or spread, determines the width of a normal distribution. A larger standard deviation results in a wider, flatter distribution, while a smaller standard deviation results in a narrower, steeper distribution.

Discuss it

A _______ t-test is used to compare two related samples or repeated measurements on a single sample.

Independent
One-sample
Paired
Two-sample

A Paired t-test is used to compare two related samples or repeated measurements on a single sample. It's often used in before-and-after scenarios where the same individuals are measured twice.

Discuss it

What is a random variable in probability theory?

A factor that doesn't change
A variable that can take on different values, each with an associated probability
An unknown variable
An unpredictable factor

A random variable in probability theory is a variable that can take on different values, each with an associated probability. It's not "random" in the everyday sense of the word, but its exact value is uncertain until it's observed.

Discuss it

The _________ test is a non-parametric test that compares the medians of two paired groups.

Chi-square
Mann-Whitney U
Sign
Wilcoxon Signed Rank

The Wilcoxon Signed Rank test is a non-parametric test that compares the medians of two paired groups.

Discuss it

Why is interval estimation generally preferred over point estimation?

Because it gives more accurate results
Because it is easier to calculate
Because it is less affected by outliers
Because it provides a range of possible values rather than a single point

Interval estimation is generally preferred over point estimation because it provides a range of possible values rather than a single value. This range of values gives a better understanding of the uncertainty around the estimate, hence, it provides more information than a single point estimate.

Discuss it

Spearman's Rank Correlation is especially useful when the relationship between variables is ________, but not necessarily linear.

Bimodal
Monotonic
Negative
Positive

Spearman's Rank Correlation is especially useful when the relationship between variables is monotonic, but not necessarily linear. A monotonic relationship is one where the variables tend to change together, but not necessarily at a constant rate.

Discuss it

If the null hypothesis is true in ANOVA, the F-statistic follows a ________ distribution.

Binomial
Chi-Square
F
Normal

In ANOVA, if the null hypothesis is true, the F-statistic follows an F-distribution. The F-distribution is a probability distribution that is used most commonly in Analysis of Variance.

Discuss it