How does factor analysis help in understanding the structure of a dataset?

  • By identifying underlying factors
  • By normalizing the data
  • By reducing noise in the data
  • By transforming the data
Factor analysis helps in understanding the structure of a dataset by identifying the underlying factors that give rise to the pattern of correlations within the set of observed variables. These factors can explain the latent structure in the data.

What is the name of the rule that states the probability of the sum of all possible outcomes of an experiment is 1?

  • Bayes' Theorem
  • Law of Large Numbers
  • Law of Total Probability
  • Rule of Complementary Events
The Law of Total Probability states that the sum of the probabilities of all possible outcomes of an experiment is 1. This rule is fundamental to probability theory and provides a way to calculate the probability of complex events by breaking them down into simpler, mutually exclusive events.

What are the two types of factor analysis used in data science?

  • Confirmatory and explanatory
  • Exploratory and confirmatory
  • Inferential and descriptive
  • Predictive and explanatory
Exploratory Factor Analysis (EFA) and Confirmatory Factor Analysis (CFA) are the two types of factor analysis commonly used in data science. EFA is used when the structure of the underlying factors is not known, while CFA is used when the researcher has specific hypotheses about the factor structure.

The __________ Theorem states that with a large enough sample size, the sampling distribution of the mean will be normally distributed.

  • Central Limit
  • Law of Large Numbers
  • Regression
  • Variance
The Central Limit Theorem is a fundamental concept in probability theory and statistics. The theorem states that, as the size of a sample is increased, the sampling distribution of the mean will be closer to a normal distribution. This happens no matter the shape of the population distribution.

A one-way ANOVA compares ________ group(s), while a two-way ANOVA compares ________ group(s).

  • one; two
  • three or more; two or more
  • two; three
  • two; two or more
A one-way ANOVA compares the means of three or more unrelated groups, while a two-way ANOVA compares the means of two or more groups that are split on two independent variables.

What are the potential disadvantages of using non-parametric statistical methods?

  • They always give inaccurate results
  • They can be less powerful than parametric tests when assumptions for parametric tests are met
  • They cannot be used for certain types of data
  • They cannot handle large data sets
Non-parametric statistical methods can be less powerful than parametric tests when the assumptions for the parametric tests are met. This is because they use less information (e.g., they use ranks rather than actual values). Therefore, if the data does meet the assumptions of parametric tests, parametric tests might be preferred.

Hypothesis testing in statistics is a way to test the validity of a claim that is made about a _______.

  • Dataset
  • Population
  • Sample
  • Statistic
In statistics, hypothesis testing is typically used to test claims about a population parameter, not a sample statistic, dataset, or an individual statistic.

The _______ Information Criterion is a measure used in model selection that takes into account the goodness of fit and the simplicity of the model.

  • Akaike
  • Bayesian
  • Pearson
  • Spearman
The Akaike Information Criterion (AIC) balances goodness of fit with model simplicity by including a penalty for the number of parameters in the model. This discourages overfitting.

How would an outlier affect the confidence interval for a mean?

  • It would make the interval narrower
  • It would make the interval skewed
  • It would make the interval wider
  • It would not affect the interval
An outlier can significantly affect the mean and increase the variability in the data, which would lead to a larger standard error and thus a wider confidence interval.

What is the difference between descriptive and inferential statistics?

  • Descriptive and inferential statistics are the same
  • Descriptive statistics predict trends; inferential statistics summarize data
  • Descriptive statistics summarize data; inferential statistics make predictions about the population
  • Descriptive statistics summarize data; inferential statistics visualize data
Descriptive statistics provide simple summaries about the sample and the measures. It's about describing the collected data using the measures such as mean, median, mode, etc. On the other hand, inferential statistics takes data from a sample and makes inferences about the larger population from which the sample was drawn. It is the process of using data analysis to deduce properties of an underlying distribution of probability.