The Mann-Whitney U test is used when data is ________, which means it can't be reasonably fit to a normal distribution.

  • non-parametric
  • normally distributed
  • parametric
  • skewed
The Mann-Whitney U test is a non-parametric test, meaning it can be used when data can't be reasonably fit to a normal distribution.

What is the implication of multicollinearity in polynomial regression?

  • It increases the fit of the model to the training data
  • It increases the interpretability of the model
  • It reduces the complexity of the model
  • It reduces the precision of coefficient estimates
Multicollinearity in polynomial regression can reduce the precision of the coefficient estimates and cause them to be highly sensitive to minor changes in the model. This can lead to unstable and unreliable estimates, making it difficult to interpret the model and infer about the relationships between variables.

How does the presence of outliers affect measures of dispersion like range, variance, and standard deviation?

  • Decreases them
  • Depends on the values of the outliers
  • Increases them
  • No effect
Outliers can greatly affect measures of dispersion like the range, variance, and standard deviation by making them larger. These measures consider the distance of each value from the mean, so an outlier (which is a value that is significantly higher or lower than the other values) can result in a much larger measure of dispersion.

The normal distribution is also known as the ________ distribution.

  • Exponential
  • Gaussian
  • Poisson
  • Uniform
The normal distribution is also known as the Gaussian distribution. It is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is bell-shaped.

How do you calculate the probability of the intersection of two independent events?

  • P(A ∩ B) = P(A) * P(B)
  • P(A ∩ B) = P(A) + P(B)
  • P(A ∩ B) = P(A) - P(B)
  • P(A ∩ B) = P(A) / P(B)
The probability of the intersection of two independent events is calculated as the product of their individual probabilities. So if A and B are independent, P(A ∩ B) = P(A) * P(B). This is a direct result of the Multiplication Rule for independent events.

What type of data represents characteristics or attributes?

  • Categorical data
  • Ordinal data
  • Qualitative data
  • Quantitative data
Qualitative data represents characteristics or attributes. It is often non-numerical and may include qualities such as textures, colors, smells, tastes, appearance, beauty, etc. This data type is commonly used in fields such as sociology, marketing, and psychology.

How is the strength of correlation between two variables determined?

  • By the correlation coefficient
  • By the number of data points
  • By the slope of the line of best fit
  • By the y-intercept of the line of best fit
The strength of correlation between two variables is determined by the correlation coefficient. A value close to +1 or -1 indicates a strong correlation, while a value close to 0 indicates a weak or no correlation.

How does the sample size affect the power of the Kruskal-Wallis Test?

  • It depends on the data
  • Larger sample sizes decrease power
  • Larger sample sizes increase power
  • Sample size has no effect on power
Larger sample sizes increase the power of the Kruskal-Wallis Test. Power is the ability of a test to detect a true effect when there is one.

Polynomial regression allows us to model a relationship between the dependent variable and independent variables as a _________.

  • High
  • Linear equation
  • Non-linear equation
  • Straight line
Polynomial regression allows us to model the relationship between the dependent variable and independent variables as a non-linear equation. This is achieved by raising independent variables to a power, allowing the model to fit more complex data patterns.

When a data distribution is skewed, which measure of central tendency is typically the most reliable?

  • Mean
  • Median
  • Mode
  • nan
The median is usually the most reliable measure of central tendency when a data distribution is skewed. Unlike the mean, the median isn't influenced by extreme values. Therefore, in a skewed distribution, the median generally gives a better idea of the typical value than the mean.

What is 'dendrogram' in hierarchical clustering?

  • A diagram showing the change in the number of clusters
  • A graph showing the distribution of clusters
  • A tree-like diagram that represents the hierarchy of clusters
  • The center point of a cluster
A dendrogram is a tree-like diagram that is used in hierarchical clustering to represent the hierarchy of clusters. Each join in the dendrogram represents the two clusters merging, and the height of the join is the distance between those clusters.

The __________ plot is used to check the linearity and equal variance assumptions of a multiple linear regression.

  • Cook's Distance
  • Leverage
  • Quantile-Quantile
  • Residuals vs fitted values
The residuals vs fitted values plot is commonly used in regression diagnostics to check the assumptions of linearity and equal variance (homoscedasticity). The residuals should be scattered randomly around zero, and the spread of the residuals should not change with the fitted values.