What are the assumptions made when using factor analysis?

  • Homoscedasticity, autocorrelation, and stationarity
  • Independence, normality, and equal variance
  • Normality, linearity, and homoscedasticity
  • Normality, linearity, and multicollinearity
The assumptions of factor analysis include normality (the variables used in the analysis should be normally distributed), linearity (the relationship between the factors and the variables should be linear), and homoscedasticity (the variances of the errors should be constant).

What is the significance of a Gaussian or normal distribution?

  • It describes the spread of evenly distributed data
  • It is the distribution that maximizes entropy
  • It is used only for discrete random variables
  • It is used when events occur at a constant rate
The Gaussian or normal distribution has several important properties and is widely used in statistics and natural sciences. It's significant because it is the distribution that maximizes entropy among all distributions with given mean and variance, making it the most "uninformative" and often serving as a good default choice in many scenarios. Also, according to the central limit theorem, the sum of many independent and identically distributed (i.i.d.) random variables tends toward a normal distribution.

How does Pearson's Correlation Coefficient differ from Spearman's Rank Correlation?

  • Pearson's correlation coefficient cannot be negative, Spearman's can
  • Pearson's correlation coefficient is non-parametric, Spearman's is parametric
  • Pearson's correlation coefficient is used for ranked data, Spearman's is not
  • Pearson's correlation coefficient measures linear relationships, Spearman's measures monotonic relationships
Pearson's correlation coefficient measures linear relationships, while Spearman's Rank Correlation measures monotonic relationships. Monotonic relationships are ones where the variables tend to change together, but not necessarily at a constant rate. Pearson's Correlation is used when the data is normally distributed, whereas Spearman's Rank Correlation is used when the data does not assume normal distribution.

What does it mean if the Chi-square statistic is significantly larger than the critical value?

  • The alternative hypothesis is true
  • The null hypothesis is true
  • The test result is insignificant
  • There is not enough evidence to reject the null hypothesis
If the Chi-square statistic is significantly larger than the critical value, we reject the null hypothesis in favor of the alternative hypothesis. This suggests that there is a significant association between the variables.

What are the limitations of using qualitative data in data analysis?

  • It cannot be easily quantified for statistical analysis
  • It may be influenced by researcher bias
  • It requires substantial resources and time for data collection
  • It's always better than quantitative data
Qualitative data has several limitations in data analysis. Firstly, it cannot be easily quantified for statistical analysis which limits its utility in certain research settings. Secondly, collecting and analyzing qualitative data often requires substantial resources and time, which can be a challenge for large-scale studies. Lastly, qualitative data may be influenced by researcher bias, particularly during data collection and interpretation.

Which method is commonly used to find the best fitting line in simple linear regression?

  • K-means clustering
  • Neural network
  • The method of least squares
  • The method of maximum likelihood
The method of least squares is commonly used to find the best fitting line in simple linear regression. It minimizes the sum of the squares of the residuals (the vertical distances between the observed and predicted values).

What is a Type II error in the context of hypothesis testing?

  • Accepting a false null hypothesis
  • Accepting a true null hypothesis
  • Rejecting a false null hypothesis
  • Rejecting a true null hypothesis
A Type II error occurs when the null hypothesis is false, but it is not rejected. It is also known as a "false negative" result.

The ________ in a Chi-square test for independence represents the sum of the squared differences between observed and expected frequencies, divided by the expected frequencies.

  • Chi-square statistic
  • correlation coefficient
  • p-value
  • standard deviation
The Chi-square statistic in a Chi-square test for independence represents the sum of the squared differences between observed and expected frequencies, divided by the expected frequencies. This statistic measures the degree to which the observed frequencies deviate from the frequencies that would be expected under the null hypothesis of independence.

How do you calculate the expected frequency in a Chi-square test?

  • By calculating the mode of the observed frequencies
  • By dividing the total frequency by the number of categories
  • By multiplying the row total and column total and dividing by the total number of observations
  • By taking the mean of the observed frequencies
In a Chi-square test, the expected frequency for each cell in the contingency table is calculated by multiplying the row total and column total and then dividing by the total number of observations.

Pearson's Correlation Coefficient ranges from ________ to ________.

  • -1 to 1
  • -2 to 2
  • 0 to 1
  • 0 to 2
The Pearson Correlation Coefficient measures the linear relationship between two variables and can range from -1 to 1. A value of -1 means there is a perfect negative correlation, while a value of 1 means there is a perfect positive correlation.