What is the name of the rule that states the probability of the sum of all possible outcomes of an experiment is 1?

  • Bayes' Theorem
  • Law of Large Numbers
  • Law of Total Probability
  • Rule of Complementary Events
The Law of Total Probability states that the sum of the probabilities of all possible outcomes of an experiment is 1. This rule is fundamental to probability theory and provides a way to calculate the probability of complex events by breaking them down into simpler, mutually exclusive events.

What are the two types of factor analysis used in data science?

  • Confirmatory and explanatory
  • Exploratory and confirmatory
  • Inferential and descriptive
  • Predictive and explanatory
Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) are the two types of factor analysis commonly used in data science. EFA is used when the structure of the underlying factors is not known, while CFA is used when the researcher has specific hypotheses about the factor structure.

The __________ Theorem states that with a large enough sample size, the sampling distribution of the mean will be normally distributed.

  • Central Limit
  • Law of Large Numbers
  • Regression
  • Variance
The Central Limit Theorem is a fundamental concept in probability theory and statistics. The theorem states that, as the size of a sample is increased, the sampling distribution of the mean will be closer to a normal distribution. This happens no matter the shape of the population distribution.

How does skewness affect the relationship between the mean, median, and mode of a distribution?

  • Changes the relationship
  • Increases the standard deviation
  • No effect
  • Reduces the kurtosis
Skewness affects the relationship between the mean, median, and mode. In a positively skewed distribution, the mean is usually greater than the median, which is greater than the mode. In a negatively skewed distribution, the mode is usually greater than the median, which is greater than the mean.

Under what conditions does the Central Limit Theorem hold true?

  • When the data is skewed
  • When the population is normal
  • When the sample size is sufficiently large
  • When the standard deviation is zero
The Central Limit Theorem holds true when the sample size is sufficiently large (usually n > 30), regardless of the shape of the population distribution. This theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed.

How does effect size impact hypothesis testing?

  • Effect size has no impact on hypothesis testing
  • Larger effect sizes always lead to rejection of the null hypothesis
  • Larger effect sizes always lead to smaller p-values
  • Larger effect sizes increase the statistical power of the test
Effect size measures the magnitude of the difference or the strength of the relationship in the population. A larger effect size means a larger difference or stronger relationship, which in turn increases the statistical power of the test. Power is the probability that the test correctly rejects the null hypothesis when the alternative is true.

How does a binomial distribution differ from a normal distribution?

  • Binomial distribution is continuous, while normal is discrete
  • Both are continuous distributions
  • Both are discrete distributions
  • Normal distribution is continuous, while binomial is discrete
A binomial distribution is discrete, meaning it only takes on integer values on a countable range, and it represents the number of successes in a fixed number of independent Bernoulli trials with a given success probability. A normal distribution is continuous, and it is often used as a first approximation to the binomial distribution, when the number of trials is large.

What is the underlying assumption of linearity in a multiple linear regression model?

  • All independent variables must have a linear relationship with the dependent variable
  • All residuals must be equal
  • All variables must be continuous
  • All variables must be normally distributed
The assumption of linearity in a multiple linear regression model assumes that the relationship between each independent variable and the dependent variable is linear. This implies that the change in the dependent variable due to a one-unit change in the independent variable is constant, regardless of the value of the independent variable.

The term '________' refers to the sharpness of the peak of a frequency-distribution curve.

  • Kurtosis
  • Median
  • Mode
  • Skewness
Kurtosis refers to the sharpness of the peak of a frequency-distribution curve. It measures the tails and sharpness of the distribution. Distributions with large kurtosis exhibit tail data exceeding the tails of the normal distribution.

How does factor analysis differ from principal component analysis (PCA)?

  • Factor analysis does not involve rotation of variables, while PCA does
  • Factor analysis looks for shared variance while PCA looks for total variance
  • PCA focuses on unobservable variables, while factor analysis focuses on observable variables
  • PCA is used for dimensionality reduction, while factor analysis is used for data cleaning
Factor analysis and PCA differ primarily in what they seek to model. Factor analysis models the shared variance among variables, focusing on the latent or unobservable variables, while PCA models the total variance and aims at reducing the dimensionality.