What is a factor loading in the context of factor analysis?

  • The correlation between a factor and a variable
  • The difference between a factor and a variable
  • The percentage of variance in a variable explained by a factor
  • The ratio between a factor and a variable
Factor loadings are the correlation coefficients between the factors and the variables. It is a measure of how much the variable is "explained" by the factor.

The variance of each Principal Component corresponds to the _______ of the covariance matrix.

  • determinant
  • diagonal elements
  • eigenvalues
  • trace
The variance of each Principal Component corresponds to the eigenvalues of the covariance matrix. A larger eigenvalue corresponds to a larger amount of variance explained by that Principal Component.

In which situations is factor analysis typically used?

  • When dealing with categorical variables
  • When dealing with high-dimensional data
  • When the data distribution is skewed
  • When there is a need to predict an outcome
Factor analysis is typically used when dealing with high-dimensional data. It is used to identify a smaller number of factors that explain the pattern of correlations within a set of observed variables, thus helping in dimensionality reduction.

The ________ is the standard deviation of the sampling distribution of a statistic.

  • Median deviation
  • Population deviation
  • Sample deviation
  • Standard error
The standard error is the standard deviation of the sampling distribution of a statistic. It measures the dispersion of the sample means around the true population mean.

Does PCA require the features to be on the same scale?

  • Depends on the algorithm used
  • Depends on the data
  • No
  • Yes
Yes, PCA requires the features to be on the same scale. If features are on different scales, PCA might end up giving higher weightage to features with higher variance, which could lead to incorrect principal components. So, it's typically a good practice to standardize the data before applying PCA.

What are communalities in factor analysis?

  • They are the shared variance between variables
  • They are the unique variances of variables
  • They are the variances of the factors after rotation
  • They represent the total variance of the factors
In factor analysis, communalities are the proportion of variance in each variable that is accounted for, or shared among the factors. They represent the shared variance between variables.

What is the difference between a positively skewed and a negatively skewed distribution?

  • Positively skewed has a longer tail on the left, negatively skewed has a longer tail on the right
  • Positively skewed has a longer tail on the right, negatively skewed has a longer tail on the left
  • Positively skewed has a peak on the left, negatively skewed has a peak on the right
  • Positively skewed has a peak on the right, negatively skewed has a peak on the left
In a positively skewed distribution, the right tail is longer or fatter (i.e., the mass of the distribution is concentrated on the left). In a negatively skewed distribution, the left tail is longer or fatter (i.e., the mass of the distribution is concentrated on the right).

The square of the standard deviation gives the _______.

  • Mean
  • Median
  • Range
  • Variance
The square of the standard deviation gives the variance. Variance is the average of the squared differences from the mean, and standard deviation is the square root of this variance. Hence, squaring the standard deviation gives us the variance.

In a histogram, what does the area under the curve represent?

  • The average value of observations
  • The median of the data
  • The total number of observations
  • The total range of the data
In a histogram, the area under the curve represents the total number of observations in the dataset. The height of each bar corresponds to the frequency of a bin, and the width of the bar corresponds to the size of the bin. So the total area of all bars equals the total number of observations.

The Mann-Whitney U test assumes that the samples are ________ and ________.

  • dependent, heterogeneous
  • dependent, homogeneous
  • independent, heterogeneous
  • independent, homogeneous
The Mann-Whitney U test assumes that the samples are independent (not paired or related) and heterogeneous (can have different variances).