What is a sampling distribution?

  • A distribution of all possible samples
  • A distribution of sample proportions
  • A distribution of sample variances
  • A distribution of the population
A sampling distribution is the distribution of a statistic (like a mean, median, or proportion) over many samples drawn from the same population. It is a distribution of all possible samples of the same size that can be obtained from a population.

The Kruskal-Wallis Test is a non-parametric method used when the assumptions of the ________ are not met.

  • ANOVA
  • Correlation analysis
  • Regression analysis
  • t-test
The Kruskal-Wallis Test is used when the assumptions of the ANOVA (like normality, homogeneity of variances) are not met. It's a non-parametric alternative to the one-way ANOVA.

In a symmetric distribution, the skewness is ________.

  • -1
  • 0
  • 1
  • It varies
In a symmetric distribution, the skewness is zero. The distribution is neither left-skewed (negative skewness) nor right-skewed (positive skewness) but symmetric.

When the population variance is unknown, a _______ test is typically used.

  • Chi-square
  • F
  • T
  • Z
A T-test is typically used when the population variance is unknown. It's based on the t-distribution, which is a family of distributions that resemble the normal distribution but have heavier tails.

What is a factor loading in the context of factor analysis?

  • The correlation between a factor and a variable
  • The difference between a factor and a variable
  • The percentage of variance in a variable explained by a factor
  • The ratio between a factor and a variable
Factor loadings are the correlation coefficients between the factors and the variables. It is a measure of how much the variable is "explained" by the factor.

In a two-way ANOVA, ________ refers to the effect of one independent variable on the dependent variable, adjusting for the effects of the other independent variables.

  • Interaction effect
  • Main effect
  • Simple effect
  • nan
In a two-way ANOVA, the main effect refers to the effect of one independent variable on the dependent variable, adjusting for the effects of the other independent variables. It provides the overall effect of one factor on the outcome, irrespective of the levels of other factors.

What assumptions must be met for a Chi-square test for independence to be valid?

  • The data must be continuous
  • The data must be normally distributed
  • The observations must be independent and the expected frequency of each category must be at least 5
  • The sample size must be larger than 30
For a Chi-square test for independence to be valid, the observations must be independent, and the expected frequency of each category must be at least 5.

The ________ in Spearman's Rank Correlation indicates the strength and direction of association between two ranked variables.

  • Coefficient
  • Median
  • P-value
  • Rank
The coefficient in Spearman's Rank Correlation indicates the strength and direction of the association between two ranked variables. This coefficient can range from -1 (perfect negative correlation) to 1 (perfect positive correlation).

What is the purpose of a scatter plot?

  • To compare two numerical variables
  • To display a distribution
  • To show the relationship between three variables
  • To visualize categorical variables
A scatter plot is a graphical representation that uses dots to represent the values obtained for two different variables - one plotted along the x-axis and the other plotted along the y-axis. It helps to identify the type of relationship (if any) between two numerical variables.

Does PCA require the features to be on the same scale?

  • Depends on the algorithm used
  • Depends on the data
  • No
  • Yes
Yes, PCA requires the features to be on the same scale. If features are on different scales, PCA might end up giving higher weightage to features with higher variance, which could lead to incorrect principal components. So, it's typically a good practice to standardize the data before applying PCA.