What is the difference between nominal and ordinal data?

  • Nominal data can be ordered
  • Nominal data cannot be ordered
  • Ordinal data can be ordered
  • Ordinal data cannot be ordered
Nominal and ordinal data are both types of categorical data. The key difference between the two is that while nominal data cannot be ordered or ranked, ordinal data can. Nominal data represents simple categories or groups with no order or priority. Examples include colors or city names. Ordinal data, on the other hand, represents categories that can be ranked or ordered. Examples include Likert scale data (e.g., a five-point scale from "strongly disagree" through "strongly agree"), educational level (high school, BA, MA, PhD), etc.

What is the purpose of a residual plot in multiple linear regression?

  • All of the above
  • To check for independence of errors
  • To check for linearity
  • To check for normality
A residual plot in multiple linear regression is used to check various assumptions of the model. It can help visualize if the residuals are randomly scattered (checking for independence), whether they have a constant variance (homoscedasticity), and if they exhibit any noticeable patterns (checking for linearity and normality).

What kind of data is best suited for the Wilcoxon Signed Rank Test?

  • Both Continuous and Ordinal data
  • Continuous data
  • Nominal data
  • Ordinal data
The Wilcoxon Signed Rank Test is best suited for continuous and ordinal data. It is a non-parametric test that can handle both types of data.

What is the relationship between a cumulative distribution function and a probability density function?

  • The cumulative distribution function is the integral of the probability density function
  • The probability density function is the integral of the cumulative distribution function
  • There is no relationship between them
  • They are the same thing
The cumulative distribution function (CDF) and the probability density function (PDF) are closely related. For a continuous random variable, the CDF is the integral of the PDF. This means that the PDF is the derivative of the CDF.

What is the Central Limit Theorem (CLT)?

  • It states that the probability of an event is the product of the probabilities of independent events.
  • It states that the sum of a large number of random variables, each with finite mean and variance, will approximate a normal distribution.
  • It's a rule which states that the probability of a compound event is the product of the probabilities of the independent events.
  • It's the theorem which states that probabilities are equal to the number of favorable outcomes divided by the total outcomes.
The Central Limit Theorem (CLT) is a statistical theory that states that given a sufficiently large sample size from a population with a finite level of variance, the mean of all samples from the same population will be approximately equal to the mean of the population.

How does the type of data affect the choice of statistical analysis methods?

  • It dictates the statistical tests that can be applied
  • It doesn't affect the choice
  • It has no influence
  • It suggests the kind of visualizations that can be used
The type of data directly affects the choice of statistical analysis methods. Certain types of data require specific statistical tests. For example, nominal data may be analyzed using a chi-square test, while continuous data may be analyzed using a t-test or ANOVA.

How do you decide on the number of Principal Components to retain during PCA?

  • All of the above
  • By calculating the cumulative explained variance
  • By checking the eigenvalues
  • By using the elbow method
The number of principal components to retain can be decided in several ways: checking the eigenvalues (typically, components with eigenvalues greater than 1 are retained), using the elbow method (looking for a clear "elbow" in the scree plot), or calculating the cumulative explained variance (often, enough components to explain at least 95% of the variance are retained).

How does the choice of significance level affect the probability of making a Type I error?

  • Higher significance level leads to higher probability of Type I error
  • Lower significance level leads to higher probability of Type I error
  • Significance level has no effect on the probability of Type I error
  • The choice of significance level affects the probability of Type II error, not Type I
The significance level (alpha) is the probability of making a Type I error. So, a higher significance level increases the chance of rejecting the null hypothesis when it's true, hence increasing the probability of a Type I error.

What can be a potential drawback of using a high degree polynomial in regression analysis?

  • It can lead to overfitting
  • It can lead to underfitting
  • It doesn't capture relationships between variables
  • It simplifies the model too much
Using a high degree polynomial in regression analysis can lead to overfitting. Overfitting occurs when a model captures not only the underlying pattern but also the noise in the data, making it perform well on the training data but poorly on new, unseen data.

Is the Kruskal-Wallis Test used for comparing two groups or more than two groups?

  • Both
  • More than two groups
  • Neither
  • Two groups
The Kruskal-Wallis Test is used for comparing more than two groups.