What is the Central Limit Theorem (CLT)?

  • It states that the probability of an event is the product of the probabilities of independent events.
  • It states that the sum of a large number of random variables, each with finite mean and variance, will approximate a normal distribution.
  • It's a rule which states that the probability of a compound event is the product of the probabilities of the independent events.
  • It's the theorem which states that probabilities are equal to the number of favorable outcomes divided by the total outcomes.
The Central Limit Theorem (CLT) is a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population.

How does the type of data affect the choice of statistical analysis methods?

  • It dictates the statistical tests that can be applied
  • It doesn't affect the choice
  • It has no influence
  • It suggests the kind of visualizations that can be used
The type of data directly affects the choice of statistical analysis methods. Certain types of data require specific statistical tests. For example, nominal data may be analyzed using a chi-square test, while continuous data may be analyzed using a t-test or ANOVA.

How do you decide on the number of Principal Components to retain during PCA?

  • All of the above
  • By calculating the cumulative explained variance
  • By checking the eigenvalues
  • By using the elbow method
The number of principal components to retain can be decided in several ways: checking the eigenvalues (typically, components with eigenvalues greater than 1 are retained), using the elbow method (looking for a clear "elbow" in the scree plot), or calculating the cumulative explained variance (often, enough components to explain at least 95% of the variance are retained).

How does the choice of significance level affect the probability of making a Type I error?

  • Higher significance level leads to higher probability of Type I error
  • Lower significance level leads to higher probability of Type I error
  • Significance level has no effect on the probability of Type I error
  • The choice of significance level affects the probability of Type II error, not Type I
The significance level (alpha) is the probability of making a Type I error. So, a higher significance level increases the chance of rejecting the null hypothesis when it's true, hence increasing the probability of a Type I error.

What type of statistical test is the Kruskal-Wallis Test?

  • Chi-square test
  • Non-parametric
  • Parametric
  • T-test
The Kruskal-Wallis Test is a non-parametric statistical test.

The ________ is the average of a data set calculated by adding all values and then dividing by the number of values.

  • Mean
  • Median
  • Mode
  • nan
The mean, also referred to as average or arithmetic mean, is calculated by adding all values in the data set and then dividing by the number of values. The mean is often used as a summary statistic.

The probability of committing a Type I error is also known as the ______ level of the test.

  • Confidence
  • Power
  • Significance
  • Size
The probability of committing a Type I error (rejecting a true null hypothesis) is known as the significance level (often denoted by alpha) of the test. A common significance level is 0.05, indicating a 5% risk of committing a Type I error if the null hypothesis is true.

The process of testing the effect of varying one predictor at different levels of another predictor is known as ________ effect analysis.

  • Additive
  • Independent
  • Interaction
  • Subtractive
This is known as interaction effect analysis. Interaction effect analysis involves testing how the effect of one predictor on the response variable changes at different levels of another predictor. It helps in understanding how different variables interact with each other to affect the dependent variable.

How does the Spearman rank correlation deal with categorical variables?

  • It assigns a numerical value to each category
  • It can't handle categorical variables
  • It groups categorical variables together
  • It transforms categorical variables into ranks
The Spearman rank correlation transforms categorical variables into ranks, which allows it to handle both continuous and ordinal (a type of categorical variable) data.

How does independence between events affect the calculation of their joint probability?

  • It makes the joint probability equal to the difference of the probabilities of each event
  • It makes the joint probability equal to the product of the probabilities of each event
  • It makes the joint probability equal to the ratio of the probabilities of each event
  • It makes the joint probability equal to the sum of the probabilities of each event
If events are independent, their joint probability equals the product of their individual probabilities. That is, P(A ∩ B) = P(A) * P(B) for independent events A and B.