What is the null hypothesis of the Spearman's Rank Correlation test?

  • The variables are not related
  • The variables have a negative correlation
  • The variables have a positive correlation
  • There is no monotonic relationship between the variables
The null hypothesis of the Spearman's Rank Correlation test is that there is no monotonic relationship between the variables. That is, changes in one variable do not consistently correspond to changes in the other variable.

What is the purpose of sampling in statistical analysis?

  • To create charts and graphs
  • To estimate population parameters
  • To gather data from every member of a population
  • To increase the variability of data
Sampling in statistical analysis is primarily used to estimate population parameters. Since it's often impractical or impossible to gather data from every individual in a population, we use samples to make inferences about the population as a whole.

Does the Central Limit Theorem apply to all distributions?

  • No, it only applies to normal distributions.
  • No, it only applies to uniform distributions.
  • Yes, but only when the sample size is sufficiently large and the distribution has finite variance.
  • Yes, regardless of the sample size.
The Central Limit Theorem (CLT) applies to the sampling distribution of the mean for a wide range of underlying distributions, provided the sample size is sufficiently large and the underlying distribution has finite variance.

What is the primary objective of statistics in data science?

  • Data storage
  • Data visualization
  • To make decisions based on data analysis
  • Web design
The primary goal of statistics in data science is to provide a foundation for decision making based on data analysis. It is a discipline that provides tools and methods to interpret and understand data, answer specific questions, and visualize data in a meaningful way. This field of study is crucial in areas where constructing decisions are essential, such as business strategies, scientific research, policy making, etc.

The Mann-Whitney U test is used when data is ________, which means it can't be reasonably fit to a normal distribution.

  • non-parametric
  • normally distributed
  • parametric
  • skewed
The Mann-Whitney U test is a non-parametric test, meaning it can be used when data can't be reasonably fit to a normal distribution.

What is the implication of multicollinearity in polynomial regression?

  • It increases the fit of the model to the training data
  • It increases the interpretability of the model
  • It reduces the complexity of the model
  • It reduces the precision of coefficient estimates
Multicollinearity in polynomial regression can reduce the precision of the coefficient estimates and cause them to be highly sensitive to minor changes in the model. This can lead to unstable and unreliable estimates, making it difficult to interpret the model and infer about the relationships between variables.

How does the presence of outliers affect measures of dispersion like range, variance, and standard deviation?

  • Decreases them
  • Depends on the values of the outliers
  • Increases them
  • No effect
Outliers can greatly affect measures of dispersion like the range, variance, and standard deviation by making them larger. These measures consider the distance of each value from the mean, so an outlier (which is a value that is significantly higher or lower than the other values) can result in a much larger measure of dispersion.

The normal distribution is also known as the ________ distribution.

  • Exponential
  • Gaussian
  • Poisson
  • Uniform
The normal distribution is also known as the Gaussian distribution. It is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is bell-shaped.

How do you calculate the probability of the intersection of two independent events?

  • P(A ∩ B) = P(A) * P(B)
  • P(A ∩ B) = P(A) + P(B)
  • P(A ∩ B) = P(A) - P(B)
  • P(A ∩ B) = P(A) / P(B)
The probability of the intersection of two independent events is calculated as the product of their individual probabilities. So if A and B are independent, P(A ∩ B) = P(A) * P(B). This is a direct result of the Multiplication Rule for independent events.

What type of data represents characteristics or attributes?

  • Categorical data
  • Ordinal data
  • Qualitative data
  • Quantitative data
Qualitative data represents characteristics or attributes. It is often non-numerical and may include qualities such as textures, colors, smells, tastes, appearance, beauty, etc. This data type is commonly used in fields such as sociology, marketing, and psychology.

How is the strength of correlation between two variables determined?

  • By the correlation coefficient
  • By the number of data points
  • By the slope of the line of best fit
  • By the y-intercept of the line of best fit
The strength of correlation between two variables is determined by the correlation coefficient. A value close to +1 or -1 indicates a strong correlation, while a value close to 0 indicates a weak or no correlation.

How does the sample size affect the power of the Kruskal-Wallis Test?

  • It depends on the data
  • Larger sample sizes decrease power
  • Larger sample sizes increase power
  • Sample size has no effect on power
Larger sample sizes increase the power of the Kruskal-Wallis Test. Power is the ability of a test to detect a true effect when there is one.