How does the choice of significance level (α) affect the conclusion of a Chi-square test for goodness of fit?
- A higher α makes it easier to reject the null hypothesis
- A higher α makes it harder to reject the null hypothesis
- α has no impact on the conclusion of the test
- α only affects the power of the test, not the conclusion
A higher significance level (α) increases the likelihood of rejecting the null hypothesis. This is because you're setting a higher bar for the amount of evidence needed to retain the null hypothesis.
If a distribution is flatter than a normal distribution, it is said to have negative ________.
- Kurtosis
- Mean
- Skewness
- Variance
If a distribution is flatter than a normal distribution, it is said to have negative kurtosis. This type of distribution has lighter tails and a flatter peak than the normal distribution. It is also called platykurtic.
A __________ is a subset of a population that is used to represent the entire group as a whole.
- Dataset
- Parameter
- Sample
- Statistic
A sample in statistics is a subset of individuals or observations from a larger population. Sampling is a key concept in statistics and data science because it allows us to collect and analyze a manageable amount of data that represents a larger group.
Which type of plot is particularly useful for identifying outliers in a dataset?
- Bar plot
- Box plot
- Histogram
- Scatter plot
Box plots are particularly useful for identifying outliers in a dataset. The box plot displays a summary of the data distribution including minimum, first quartile, median, third quartile, and maximum. Outliers are typically represented as individual points that are far from the 'box' and 'whiskers'.
What are the implications of the Central Limit Theorem on statistical testing?
- It asserts that all statistical tests must involve the normal distribution.
- It eliminates the need for statistical testing.
- It guarantees that all results of statistical tests will be accurate.
- It states that sample means will be normally distributed regardless of the shape of the population distribution.
The Central Limit Theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables will be approximately normally distributed, regardless of the shape of the original distribution. This underpins many statistical methods, including hypothesis tests and confidence intervals, which may assume normality of the sampling distribution.
How does PCA help in reducing the dimensionality of the dataset?
- By creating new uncorrelated variables
- By grouping similar data together
- By removing unnecessary data
- By rotating the data to align with axes
PCA reduces the dimensionality of a dataset by creating new uncorrelated variables that successively maximize variance. These new variables or "principal components" can replace the original variables, thus reducing the data's dimensionality.
In a 95% confidence interval, if the true population parameter lies outside of the interval, it is considered a _______ error.
- Alpha
- Standard
- Type I
- Type II
In a 95% confidence interval, if the true population parameter lies outside of the interval, it is considered a Type I error. This is when the null hypothesis is true, but is incorrectly rejected.
_________ sampling is a method where every individual in the population has an equal chance of being selected.
- Cluster
- Simple Random
- Stratified
- Systematic
Simple random sampling is a basic type of sampling method where each individual in the population has an equal chance of being selected. This ensures that the sample will be representative of the population, making it easier to make accurate inferences about the whole population.
What is the relationship between the eigenvalue of a component and the variance of that component in PCA?
- It depends on the dataset
- There is no relationship
- They are directly proportional
- They are inversely proportional
The eigenvalue of a component in PCA is directly proportional to the variance of that component. In other words, a larger eigenvalue corresponds to a larger amount of variance explained by that principal component.
How is Bayes' theorem related to conditional probability?
- Bayes' theorem and conditional probability are not related
- Bayes' theorem cannot be used with conditional probability
- Bayes' theorem is a specific type of conditional probability
- Bayes' theorem is used to calculate the complement of the conditional probability
Bayes' theorem is a way of finding a probability when we know certain other probabilities. The probabilities that we know are usually conditional probabilities, and Bayes' theorem is used to 'reverse' these probabilities.
The ________ in a two-way ANOVA can reveal whether the effect of one independent variable depends on the level of the other independent variable.
- Effect size
- Interaction effect
- Main effect
- Post-hoc test
The interaction effect in a two-way ANOVA reveals whether the effect of one independent variable depends on the level of the other independent variable. This allows us to understand how the independent variables relate to each other.
What is the main difference between the Wilcoxon Signed Rank Test and the paired t-test?
- All of the above
- The Wilcoxon test is non-parametric while the t-test is parametric
- The Wilcoxon test is used for ordinal data while the t-test is used for continuous data
- The Wilcoxon test uses ranks while the t-test uses actual values
The Wilcoxon Signed Rank Test is a non-parametric test that uses ranks and is used for ordinal data, while the paired t-test is a parametric test that uses actual values and is typically used for continuous data.