In which situation is Spearman's Rank Correlation preferable to Pearson's correlation?
- When the data is normally distributed
- When the relationship between variables is non-linear and monotonic
- When the relationship is linear
- When there are no ties in the ranks
Spearman's Rank Correlation is preferable to Pearson's correlation when the relationship between variables is non-linear but monotonic. Pearson's correlation measures linear relationships, while Spearman's can capture non-linear relationships.
What is the z-value associated with a 95% confidence interval in a standard normal distribution?
- 1.64
- 1.96
- 2
- 2.33
The z-value associated with a 95% confidence interval in a standard normal distribution is approximately 1.96. This means that we are 95% confident that the true population parameter lies within 1.96 standard deviations of the sample mean.
How is the interquartile range different from the range in handling outliers?
- Both exclude outliers
- Both include outliers
- The interquartile range does not include outliers, the range does
- The interquartile range includes outliers, the range does not
The interquartile range, which is the difference between the upper quartile (Q3) and the lower quartile (Q1), represents the middle 50% of the data and is not affected by outliers. The range, on the other hand, is the difference between the maximum and minimum data values and is significantly affected by outliers.
How can 'outliers' impact the result of K-means clustering?
- Outliers can distort the shape and size of the clusters
- Outliers can lead to fewer clusters
- Outliers can lead to more clusters
- Outliers don't impact K-means clustering
Outliers can have a significant impact on the result of K-means clustering. They can distort the shape and size of the clusters, as they may pull the centroid towards them, creating less accurate and meaningful clusters.
A positive Pearson's Correlation Coefficient indicates a ________ relationship between two variables.
- inverse
- linear
- perfect
- positive
A positive Pearson's Correlation Coefficient indicates a positive relationship between two variables. This means that as one variable increases, the other variable also increases, and vice versa.
What are the assumptions made in simple linear regression?
- Homogeneity, normality, and symmetry
- Independence, homogeneity, and linearity
- Linearity, homoscedasticity, and normality
- Symmetry, linearity, and independence
The assumptions made in simple linear regression include linearity (the relationship between the independent and dependent variables is linear), homoscedasticity (the variance of the residuals is constant across all levels of the independent variable), and normality (the residuals are normally distributed).
In what situations is the coefficient of variation a better measure of dispersion than the standard deviation?
- When data sets have different units
- When data sets have the same units
- When the data set is normally distributed
- When the mean of the data set is zero
The coefficient of variation (CV) is a standardized measure of dispersion that is unitless. It's particularly useful when comparing the dispersion of two or more datasets that have different units or significantly different means. Standard deviation, on the other hand, has the same units as the data, which may not be helpful for comparisons across different datasets.
Under what circumstances can the conditional probability of an event be equal to its marginal probability?
- When the event is certain
- When the event is dependent on all other events
- When the event is impossible
- When the event is independent of all other events
The conditional probability of an event A given an event B equals the marginal probability of A when A and B are independent. This is because the occurrence of B does not change the probability of A if they are independent.
What type of data is the Mann-Whitney U test used for?
- Interval data
- Nominal data
- Ordinal data
- Ratio data
The Mann-Whitney U test is used for ordinal data, which can be ranked but have unknown or non-equivalent differences between values. It can also be used with interval and ratio data that do not meet the assumptions of other tests.
What does the 'mode' refer to in a data set?
- The average value
- The middle value
- The most frequently occurring value
- The range of values
The mode in a data set refers to the most frequently occurring value. In a dataset, the mode is the value that appears the most number of times. A dataset may have one mode (unimodal), two modes (bimodal), or multiple modes (multimodal).