In what type of data distribution is the mean usually greater than the median?

  • Negatively skewed distribution
  • Normal distribution
  • Positively skewed distribution
  • Uniform distribution
In a positively skewed distribution, the mean is usually greater than the median. A positive skew means the right tail of the distribution is longer or fatter. The mean, being affected by the values of the data points, gets dragged in the direction of the tail, and hence is typically greater than the median in a positively skewed distribution.

The ________ is the standard deviation of the sampling distribution of a statistic.

  • Median deviation
  • Population deviation
  • Sample deviation
  • Standard error
The standard error is the standard deviation of the sampling distribution of a statistic. It measures the dispersion of the sample means around the true population mean.

In which situations is factor analysis typically used?

  • When dealing with categorical variables
  • When dealing with high-dimensional data
  • When the data distribution is skewed
  • When there is a need to predict an outcome
Factor analysis is typically used when dealing with high-dimensional data. It is used to identify a smaller number of factors that explain the pattern of correlations within a set of observed variables, thus helping in dimensionality reduction.

The variance of each Principal Component corresponds to the _______ of the covariance matrix.

  • determinant
  • diagonal elements
  • eigenvalues
  • trace
The variance of each Principal Component corresponds to the eigenvalues of the covariance matrix. A larger eigenvalue corresponds to a larger amount of variance explained by that Principal Component.

What is a factor loading in the context of factor analysis?

  • The correlation between a factor and a variable
  • The difference between a factor and a variable
  • The percentage of variance in a variable explained by a factor
  • The ratio between a factor and a variable
Factor loadings are the correlation coefficients between the factors and the variables. It is a measure of how much the variable is "explained" by the factor.

When the population variance is unknown, a _______ test is typically used.

  • Chi-square
  • F
  • T
  • Z
A T-test is typically used when the population variance is unknown. It's based on the t-distribution, which is a family of distributions that resemble the normal distribution but have heavier tails.

In a symmetric distribution, the skewness is ________.

  • -1
  • 0
  • 1
  • It varies
In a symmetric distribution, the skewness is zero. The distribution is neither left-skewed (negative skewness) nor right-skewed (positive skewness) but symmetric.

The Kruskal-Wallis Test is a non-parametric method used when the assumptions of the ________ are not met.

  • ANOVA
  • Correlation analysis
  • Regression analysis
  • t-test
The Kruskal-Wallis Test is used when the assumptions of the ANOVA (like normality, homogeneity of variances) are not met. It's a non-parametric alternative to the one-way ANOVA.

What is a sampling distribution?

  • A distribution of all possible samples
  • A distribution of sample proportions
  • A distribution of sample variances
  • A distribution of the population
A sampling distribution is the distribution of a statistic (like a mean, median, or proportion) over many samples drawn from the same population. It is a distribution of all possible samples of the same size that can be obtained from a population.

What are the two branches of statistics?

  • Descriptive and hypothetical
  • Descriptive and inferential
  • Inferential and hypothetical
  • Predictive and inferential
The two main branches of statistics are descriptive and inferential. Descriptive statistics involves methods of organizing, picturing, and summarizing information from data. It provides simple summaries about the sample and measures, such as mean, median, mode, etc. Inferential statistics, on the other hand, involves methods of using information from a sample to draw conclusions (inferences) about the population. It includes various techniques like hypothesis testing, regression analysis, etc.

How does the Mann-Whitney U test compare to the Wilcoxon rank-sum test?

  • They are identical tests
  • They are used for different types of data
  • They handle ties differently
  • They make different assumptions about the data
The Mann-Whitney U test and the Wilcoxon rank-sum test are essentially the same test, although they use slightly different methods of calculation. Both are non-parametric tests used to determine if two independent samples were drawn from a population with the same distribution.

In a histogram, what does the area under the curve represent?

  • The average value of observations
  • The median of the data
  • The total number of observations
  • The total range of the data
In a histogram, the area under the curve represents the total number of observations in the dataset. The height of each bar corresponds to the frequency of a bin, and the width of the bar corresponds to the size of the bin. So the total area of all bars equals the total number of observations.