In a 95% confidence interval, if the true population parameter lies outside of the interval, it is considered a _______ error.

  • Alpha
  • Standard
  • Type I
  • Type II
In a 95% confidence interval, if the true population parameter lies outside of the interval, it is considered a Type I error. This is when the null hypothesis is true, but is incorrectly rejected.

How does PCA help in reducing the dimensionality of the dataset?

  • By creating new uncorrelated variables
  • By grouping similar data together
  • By removing unnecessary data
  • By rotating the data to align with axes
PCA reduces the dimensionality of a dataset by creating new uncorrelated variables that successively maximize variance. These new variables or "principal components" can replace the original variables, thus reducing the data's dimensionality.

What are the implications of the Central Limit Theorem on statistical testing?

  • It asserts that all statistical tests must involve the normal distribution.
  • It eliminates the need for statistical testing.
  • It guarantees that all results of statistical tests will be accurate.
  • It states that sample means will be normally distributed regardless of the shape of the population distribution.
The Central Limit Theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables will be approximately normally distributed, regardless of the shape of the original distribution. This underpins many statistical methods, including hypothesis tests and confidence intervals, which may assume normality of the sampling distribution.

Which type of plot is particularly useful for identifying outliers in a dataset?

  • Bar plot
  • Box plot
  • Histogram
  • Scatter plot
Box plots are particularly useful for identifying outliers in a dataset. The box plot displays a summary of the data distribution including minimum, first quartile, median, third quartile, and maximum. Outliers are typically represented as individual points that are far from the 'box' and 'whiskers'.

A __________ is a subset of a population that is used to represent the entire group as a whole.

  • Dataset
  • Parameter
  • Sample
  • Statistic
A sample in statistics is a subset of individuals or observations from a larger population. Sampling is a key concept in statistics and data science because it allows us to collect and analyze a manageable amount of data that represents a larger group.

If a distribution is flatter than a normal distribution, it is said to have negative ________.

  • Kurtosis
  • Mean
  • Skewness
  • Variance
If a distribution is flatter than a normal distribution, it is said to have negative kurtosis. This type of distribution has lighter tails and a flatter peak than the normal distribution. It is also called platykurtic.

How does the choice of significance level (α) affect the conclusion of a Chi-square test for goodness of fit?

  • A higher α makes it easier to reject the null hypothesis
  • A higher α makes it harder to reject the null hypothesis
  • α has no impact on the conclusion of the test
  • α only affects the power of the test, not the conclusion
A higher significance level (α) increases the likelihood of rejecting the null hypothesis. This is because you're setting a higher bar for the amount of evidence needed to retain the null hypothesis.

How does the sample size affect the standard error of a sample mean?

  • Larger sample sizes decrease the standard error
  • Larger sample sizes increase the standard error
  • Smaller sample sizes decrease the standard error
  • The sample size has no effect on the standard error
The sample size has an inverse relationship with the standard error of a sample mean. As the sample size increases, the standard error decreases. This is because larger samples provide a better approximation of the population, reducing the variability of the sample mean around the population mean.

What does a larger sample size do to the sampling distribution of the mean?

  • It decreases the spread of the distribution
  • It does not affect the distribution
  • It increases the spread of the distribution
  • It skews the distribution
A larger sample size decreases the spread of the sampling distribution of the mean. This is because as the sample size increases, the standard error (a measure of the spread of the distribution of sample means) decreases, which means that the sampling distribution becomes more concentrated around the true population mean.

A probability must be a number between ________ and ________.

  • #NAME?
  • -1, 1
  • 0, 1
  • 1, 100
By definition, the probability of an event is a number between 0 and 1. A probability of 0 means the event will never occur, and a probability of 1 means the event is certain to occur.