If the occurrence of A does not affect the occurrence of B, we say A and B are ________.
- Dependent
- Independent
- Joint
- Mutually exclusive
If the occurrence of A does not affect the occurrence of B, we say A and B are independent. This is a key concept in probability theory where the occurrence of one event does not change the probability of another.
What are the ways to check the assumptions of an ANOVA test?
- By calculating the F-statistic
- By calculating the mean and variance of each group
- By checking normality of residuals, homogeneity of variance, and independence of observations
- By conducting post-hoc tests
The assumptions of an ANOVA test can be checked by: 1. Checking the normality of residuals using a normal probability plot or a statistical test like the Shapiro-Wilk test; 2. Checking the homogeneity of variance using a Levene's test or Bartlett's test; 3. Checking the independence of observations which usually pertains to the study design (random sampling, random assignment).
What does a Principal Component represent in a dataset?
- A combination of original features
- A feature of the dataset
- A group of similar data points
- A target variable
A Principal Component is a linear combination of the original features in a dataset. Each principal component is orthogonal to each other, meaning they are uncorrelated and each represents a different direction in which the data varies.
Can the Mann-Whitney U test be used for paired samples?
- No
- Only if the data is normally distributed
- Only if the variances are equal
- Yes
No, the Mann-Whitney U test is not used for paired samples. It is designed for two independent samples. For paired samples, a different test, such as the Wilcoxon signed-rank test, would be more appropriate.
When is it more appropriate to use the Mann-Whitney U test than a t-test?
- When data is normally distributed
- When data is not normally distributed
- When sample sizes are equal
- When the variances of the two groups are equal
The Mann-Whitney U test is more appropriate to use than a t-test when the data is not normally distributed. This test is a non-parametric alternative to the independent t-test and does not assume normality.
In the Kruskal-Wallis Test, if the p-value is less than the chosen significance level, we ________ the null hypothesis.
- accept
- consider
- ignore
- reject
If the p-value is less than the chosen significance level in the Kruskal-Wallis Test, we reject the null hypothesis. It means there is enough evidence to suggest that at least one of the groups is different from the others.
What is the main difference between a population and a sample?
- A population can only consist of people
- A population is always smaller than a sample
- A sample is a subset of a population
- A sample is always larger than a population
The main difference between a population and a sample is that a sample is a subset of a population. A population refers to the entire group of individuals or observations that we're interested in, while a sample is a smaller group that's been selected from that population.
What strategies can be employed to reduce both Type I and Type II errors?
- Decrease sample size, use a more lenient significance level
- Decrease sample size, use a more stringent significance level
- Increase sample size, use a more lenient significance level
- Increase sample size, use a more stringent significance level
Increasing the sample size makes the test more sensitive, reducing both Type I and Type II errors. Similarly, a more stringent significance level (lower α) reduces the chance of a Type I error. However, it's important to note that decreasing Type I error probability often leads to an increase in Type II error probability, and vice versa. This is known as the Type I/Type II tradeoff.
What does the interquartile range in a box plot represent?
- The middle 50% of the data
- The range of the top 25% of the data
- The range within one standard deviation from the mean
- The total range of the dataset
The interquartile range (IQR) in a box plot represents the middle 50% of the data. It is the range within which the central half of the values fall and is calculated as the difference between the third quartile (Q3) and the first quartile (Q1).
What is the purpose of non-parametric statistical methods?
- To analyze data without making assumptions about the population distribution
- To make the calculation process more complex
- To provide less accurate results
- To use less data in the analysis
Non-parametric statistical methods are used to analyze data without making assumptions about the population distribution. These tests are based on differences in medians or ranks rather than differences in means.