What does a larger sample size do to the sampling distribution of the mean?
- It decreases the spread of the distribution
- It does not affect the distribution
- It increases the spread of the distribution
- It skews the distribution
A larger sample size decreases the spread of the sampling distribution of the mean. This is because as the sample size increases, the standard error (a measure of the spread of the distribution of sample means) decreases, which means that the sampling distribution becomes more concentrated around the true population mean.
What is the relationship between the eigenvalue of a component and the variance of that component in PCA?
- It depends on the dataset
- There is no relationship
- They are directly proportional
- They are inversely proportional
The eigenvalue of a component in PCA is directly proportional to the variance of that component. In other words, a larger eigenvalue corresponds to a larger amount of variance explained by that principal component.
_________ sampling is a method where every individual in the population has an equal chance of being selected.
- Cluster
- Simple Random
- Stratified
- Systematic
Simple random sampling is a basic type of sampling method where each individual in the population has an equal chance of being selected. This ensures that the sample will be representative of the population, making it easier to make accurate inferences about the whole population.
In a 95% confidence interval, if the true population parameter lies outside of the interval, it is considered a _______ error.
- Alpha
- Standard
- Type I
- Type II
In a 95% confidence interval, if the true population parameter lies outside of the interval, it is considered a Type I error. This is when the null hypothesis is true, but is incorrectly rejected.
How does PCA help in reducing the dimensionality of the dataset?
- By creating new uncorrelated variables
- By grouping similar data together
- By removing unnecessary data
- By rotating the data to align with axes
PCA reduces the dimensionality of a dataset by creating new uncorrelated variables that successively maximize variance. These new variables or "principal components" can replace the original variables, thus reducing the data's dimensionality.
What are the implications of the Central Limit Theorem on statistical testing?
- It asserts that all statistical tests must involve the normal distribution.
- It eliminates the need for statistical testing.
- It guarantees that all results of statistical tests will be accurate.
- It states that sample means will be normally distributed regardless of the shape of the population distribution.
The Central Limit Theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables will be approximately normally distributed, regardless of the shape of the original distribution. This underpins many statistical methods, including hypothesis tests and confidence intervals, which may assume normality of the sampling distribution.
Which type of plot is particularly useful for identifying outliers in a dataset?
- Bar plot
- Box plot
- Histogram
- Scatter plot
Box plots are particularly useful for identifying outliers in a dataset. The box plot displays a summary of the data distribution including minimum, first quartile, median, third quartile, and maximum. Outliers are typically represented as individual points that are far from the 'box' and 'whiskers'.
A __________ is a subset of a population that is used to represent the entire group as a whole.
- Dataset
- Parameter
- Sample
- Statistic
A sample in statistics is a subset of individuals or observations from a larger population. Sampling is a key concept in statistics and data science because it allows us to collect and analyze a manageable amount of data that represents a larger group.
If a distribution is flatter than a normal distribution, it is said to have negative ________.
- Kurtosis
- Mean
- Skewness
- Variance
If a distribution is flatter than a normal distribution, it is said to have negative kurtosis. This type of distribution has lighter tails and a flatter peak than the normal distribution. It is also called platykurtic.
A Chi-square test for independence is used to determine if there is a significant relationship between two ________ variables.
- categorical
- continuous
- nominal
- ordinal
A Chi-square test for independence is used to determine if there is a significant relationship between two categorical variables. It is not applicable for continuous, ordinal, or nominal variables.