How does the interquartile mean provide a measure of central tendency that is resistant to outliers?
- By focusing on the data between the first and third quartiles
- By focusing only on the highest values in the data
- By focusing only on the lowest values in the data
- By ignoring all outlier values
The interquartile mean focuses on the data between the first quartile (25th percentile) and the third quartile (75th percentile), excluding the lowest 25% and the highest 25% of data points. This makes it less influenced by outliers and extreme values, hence a more robust measure of central tendency for skewed or asymmetrical distributions.
In hypothesis testing, a Type I error is committed when the null hypothesis is ______ but we ______ it.
- False, fail to reject
- False, reject
- True, fail to reject
- True, reject
A Type I error, also known as a false positive, occurs when we reject a true null hypothesis. This means we've found evidence of an effect or difference when there really isn't one.
The ______ Rule of Probability is used when we want to find the probability that either of two events happens.
- Addition
- Division
- Multiplication
- Subtraction
The Addition Rule of Probability is used when we want to find the probability that either of two events happens. This rule states that the probability of either of two mutually exclusive events occurring is the sum of their individual probabilities.
What role does Bayes' theorem play in machine learning algorithms?
- It is not used in machine learning algorithms
- It is used to calculate error rates
- It is used to divide the data into training and test sets
- It is used to update prior beliefs based on new data
Bayes' theorem is used in various machine learning algorithms to update prior beliefs based on new data. For example, in Bayesian classifiers, it is used to estimate the parameters of the model and make predictions.
How does the Akaike Information Criterion (AIC) handle the trade-off between goodness of fit and model complexity in model selection?
- It always prefers a more complex model.
- It always prefers a simpler model.
- It does not consider model complexity.
- It penalizes models with more parameters to avoid overfitting.
The AIC handles the trade-off by introducing a penalty term for the number of parameters in the model. This discourages overfitting and leads to a balance between model fit and complexity.
What information does a box plot provide about a dataset?
- The correlation between variables
- The exact values of all data points
- The mean and standard deviation
- The minimum, first quartile, median, third quartile, and maximum
A box plot (also known as a whisker plot) displays a summary of the distribution of data values, including the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. The 'box' represents the interquartile range (the distance between Q1 and Q3), and the 'whiskers' represent the range of the data. Outliers may also be plotted as individual points.
Why is sampling without replacement often used in practice?
- It allows for the inclusion of every individual in the population
- It ensures that each selection is independent
- It guarantees that each sample is unique
- It is easier than sampling with replacement
Sampling without replacement is often used in practice because it guarantees that each sample is unique. This means that once an individual is selected, it cannot be chosen again for the same sample. This method can help reduce bias and ensure a more diverse and representative sample.
Why is the Spearman rank correlation considered a non-parametric test?
- It assumes a normal distribution
- It can't handle ordinal data
- It does not assume a normal distribution
- It tests for a linear relationship
The Spearman rank correlation is considered a non-parametric test because it does not assume a normal distribution of data. It only assumes that the variables are ordinal or continuous and that the relationship between them is monotonic.
What are the degrees of freedom in a Chi-square test for a 2x3 contingency table?
- 2
- 3
- 4
- 6
In a Chi-square test, the degrees of freedom for a 2x3 contingency table is (2-1) * (3-1) = 2.
The process that aims to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables is called _______.
- correlation analysis
- covariance analysis
- factor analysis
- regression analysis
The process that aims to identify underlying variables, or factors, that explain the pattern of correlations within a set of observed variables is called factor analysis.