A 95% confidence interval means that if the same sampling method is repeated many times, then ________% of the confidence intervals will contain the true population parameter.

50
75
90
95

A 95% confidence interval means that if we were to take a large number of samples and calculate the confidence interval for each sample, we would expect the true population parameter to fall within the interval 95% of the time.

Discuss it

What is the significance of 'distance measures' in cluster analysis?

Distance measures determine the similarities or differences between data points
Distance measures help in determining the shape of the clusters
Distance measures help in visualizing the clusters
Distance measures indicate the number of clusters

Distance measures, like Euclidean distance or Manhattan distance, play a crucial role in cluster analysis. They determine the similarities or differences between data points. They influence how the clusters will be formed, as the most similar or closest data points get clustered together.

Discuss it

In what situations is Spearman's rank correlation preferred over Pearson's correlation?

All of the above
When the data contains outliers
When the relationship between variables is nonlinear
When the variables are not normally distributed

Spearman's rank correlation coefficient is a nonparametric measure of rank correlation. It's preferred over Pearson's correlation when the variables are not normally distributed, the relationship is nonlinear, or the data contains outliers. It assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables.

Discuss it

A __________ is the difference between the observed value and the predicted value of the response variable in regression analysis.

Coefficient
Error term
Outlier
Residual

In the context of regression analysis, the difference between the observed value and the predicted value of the response variable is called a "residual".

Discuss it

What is a residual in the context of simple linear regression?

The difference between the observed and predicted values
The difference between the predicted and observed values of the independent variable
The error in the slope of the regression line
The observed value of the dependent variable

A residual is the difference between the observed value of the dependent variable (y) and the predicted value (ŷ), given by the regression model. It represents the error of the estimate.

Discuss it

How do changes in the scale of measurement affect the correlation coefficient?

They decrease the correlation coefficient
They do not affect the correlation coefficient
They increase the correlation coefficient
They reverse the sign of the correlation coefficient

The correlation coefficient is not affected by changes in the center (mean) or scale (standard deviation) of the variables. This is because correlation measures the strength of a relationship between variables relative to their variability. It's a dimensionless quantity, so changes in the scale of measurements of the variables do not change it.

Discuss it

How is the Bartlett's Test of Sphericity used in factor analysis?

It tests the assumption of equal variances
It tests the assumption of linearity
It tests the assumption of normality
It tests the assumption that the variables are uncorrelated

Bartlett's Test of Sphericity is used in factor analysis to test the hypothesis that the variables are uncorrelated in the population. In other words, the population correlation matrix is an identity matrix. A significant test indicates that a factor analysis may be useful with your data.

Discuss it

In multiple linear regression, what does each coefficient represent?

The average change in the dependent variable for one unit of change in the independent variable, holding all other independent variables constant
The correlation between the dependent variable and the independent variable
The error term in the regression model
The total variation in the dependent variable explained by the independent variable

In multiple linear regression, each coefficient represents the average change in the dependent variable for one unit of change in the independent variable, while holding all other independent variables constant.

Discuss it

A wider confidence interval indicates a higher level of _______ about the estimate.

Certainty
Standard deviation
Uncertainty
Variance

A wider confidence interval suggests a higher level of uncertainty about the estimate because the range of values for the population parameter is larger.

Discuss it

How does PCA relate to the Singular Value Decomposition (SVD) technique?

PCA can be implemented using SVD
SVD is a prerequisite for PCA
SVD is a type of PCA
They are entirely different techniques

PCA can be implemented using SVD. Both techniques can be used for dimensionality reduction, and they both rely on eigenvalue decomposition, but SVD decomposes the data matrix directly, while PCA works on the covariance matrix of the data.

Discuss it