Principal Component Analysis (PCA) is a dimensionality reduction technique that projects the data into a lower dimensional space called the _______.
- eigen space
- feature space
- subspace
- variance space
PCA is a technique that projects the data into a new, lower-dimensional subspace. This subspace consists of principal components which are orthogonal to each other and capture the maximum variance in the data.
What are the assumptions made in simple linear regression?
- Homogeneity, normality, and symmetry
- Independence, homogeneity, and linearity
- Linearity, homoscedasticity, and normality
- Symmetry, linearity, and independence
The assumptions made in simple linear regression include linearity (the relationship between the independent and dependent variables is linear), homoscedasticity (the variance of the residuals is constant across all levels of the independent variable), and normality (the residuals are normally distributed).
A positive Pearson's Correlation Coefficient indicates a ________ relationship between two variables.
- inverse
- linear
- perfect
- positive
A positive Pearson's Correlation Coefficient indicates a positive relationship between two variables. This means that as one variable increases, the other variable also increases, and vice versa.
How can 'outliers' impact the result of K-means clustering?
- Outliers can distort the shape and size of the clusters
- Outliers can lead to fewer clusters
- Outliers can lead to more clusters
- Outliers don't impact K-means clustering
Outliers can have a significant impact on the result of K-means clustering. They can distort the shape and size of the clusters, as they may pull the centroid towards them, creating less accurate and meaningful clusters.
How is the interquartile range different from the range in handling outliers?
- Both exclude outliers
- Both include outliers
- The interquartile range does not include outliers, the range does
- The interquartile range includes outliers, the range does not
The interquartile range, which is the difference between the upper quartile (Q3) and the lower quartile (Q1), represents the middle 50% of the data and is not affected by outliers. The range, on the other hand, is the difference between the maximum and minimum data values and is significantly affected by outliers.
What is the z-value associated with a 95% confidence interval in a standard normal distribution?
- 1.64
- 1.96
- 2
- 2.33
The z-value associated with a 95% confidence interval in a standard normal distribution is approximately 1.96. This means that we are 95% confident that the true population parameter lies within 1.96 standard deviations of the sample mean.
In which situation is Spearman's Rank Correlation preferable to Pearson's correlation?
- When the data is normally distributed
- When the relationship between variables is non-linear and monotonic
- When the relationship is linear
- When there are no ties in the ranks
Spearman's Rank Correlation is preferable to Pearson's correlation when the relationship between variables is non-linear but monotonic. Pearson's correlation measures linear relationships, while Spearman's can capture non-linear relationships.
The Kruskal-Wallis Test is used to compare ________ independent samples.
- four
- three
- three or more
- two
The Kruskal-Wallis Test is used to compare three or more independent samples. It's an extension of the Mann-Whitney U Test for more than two groups.
The type of factor analysis in which the researcher assumes that all variance in the observed variables is common variance is known as _______ factor analysis.
- common factor
- confirmatory
- exploratory
- principal component
The type of factor analysis in which the researcher assumes that all variance in the observed variables is common variance is known as common factor analysis.
What is the main purpose of simple linear regression?
- To find the average of the data
- To identify outliers
- To understand the relationship between two variables
- To visualize the data
The main purpose of simple linear regression is to understand the relationship between two variables. It provides a quantitative estimate of the relationship between one dependent variable and one independent variable.
What happens when the assumptions about residuals in linear regression are violated?
- The interpretation of the model changes
- The model becomes invalid
- The model becomes underfit
- The standard errors, confidence intervals, and hypothesis tests may not be valid
Violations of the assumptions about residuals in linear regression can lead to inefficient and biased estimates, and standard errors, confidence intervals, and hypothesis tests may not be valid. This can lead to incorrect inferences and predictions.
What implications does an insignificant F-test have in the context of multiple linear regression?
- The model does not explain a significant amount of the variance in the response
- The model explains a significant amount of the variance in the response
- The model has a high R-squared value
- The model has violated the assumption of homoscedasticity
The F-test in multiple linear regression tests the null hypothesis that all regression coefficients are equal to zero. An insignificant F-test suggests that the predictors do not explain a significant amount of the variance in the response variable.