If we want to reduce both Type I and Type II errors, we could increase the ______.

  • Confidence level
  • Power of the test
  • Sample size
  • Significance level
Increasing the sample size makes the test more sensitive, thereby reducing both Type I and Type II errors. With a larger sample, there is more data available, which often leads to more accurate and reliable results. However, resources, time, and other constraints often limit the sample size in real-world studies.

How does the Central Limit Theorem influence the shape of the distribution of sample means?

  • It states that all distributions will be skewed to the right.
  • It states that as the sample size increases, the distribution of sample means will more closely approximate a normal distribution, regardless of the shape of the population distribution.
  • The Central Limit Theorem does not influence the shape of the distribution.
  • The Central Limit Theorem turns all distributions into uniform distributions.
The Central Limit Theorem (CLT) states that the distribution of sample means will tend towards a normal distribution as the sample size increases, regardless of the shape of the population distribution. Therefore, the CLT has a profound impact on the shape of the distribution, tending to 'normalize' it as sample size increases.

Each subsequent Principal Component must be ______ to all the previous Principal Components.

  • equal
  • orthogonal
  • parallel
  • proportional
Each subsequent Principal Component in PCA must be orthogonal (perpendicular) to all previous Principal Components. This ensures that the Principal Components are uncorrelated.

What are the assumptions made when using factor analysis?

  • Homoscedasticity, autocorrelation, and stationarity
  • Independence, normality, and equal variance
  • Normality, linearity, and homoscedasticity
  • Normality, linearity, and multicollinearity
The assumptions of factor analysis include normality (the variables used in the analysis should be normally distributed), linearity (the relationship between the factors and the variables should be linear), and homoscedasticity (the variances of the errors should be constant).

What is the significance of a Gaussian or normal distribution?

  • It describes the spread of evenly distributed data
  • It is the distribution that maximizes entropy
  • It is used only for discrete random variables
  • It is used when events occur at a constant rate
The Gaussian or normal distribution has several important properties and is widely used in statistics and natural sciences. It's significant because it is the distribution that maximizes entropy among all distributions with given mean and variance, making it the most "uninformative" and often serving as a good default choice in many scenarios. Also, according to the central limit theorem, the sum of many independent and identically distributed (i.i.d.) random variables tends toward a normal distribution.

How does Pearson's Correlation Coefficient differ from Spearman's Rank Correlation?

  • Pearson's correlation coefficient cannot be negative, Spearman's can
  • Pearson's correlation coefficient is non-parametric, Spearman's is parametric
  • Pearson's correlation coefficient is used for ranked data, Spearman's is not
  • Pearson's correlation coefficient measures linear relationships, Spearman's measures monotonic relationships
Pearson's correlation coefficient measures linear relationships, while Spearman's Rank Correlation measures monotonic relationships. Monotonic relationships are ones where the variables tend to change together, but not necessarily at a constant rate. Pearson's Correlation is used when the data is normally distributed, whereas Spearman's Rank Correlation is used when the data does not assume normal distribution.

What does it mean if the Chi-square statistic is significantly larger than the critical value?

  • The alternative hypothesis is true
  • The null hypothesis is true
  • The test result is insignificant
  • There is not enough evidence to reject the null hypothesis
If the Chi-square statistic is significantly larger than the critical value, we reject the null hypothesis in favor of the alternative hypothesis. This suggests that there is a significant association between the variables.

What are the limitations of using qualitative data in data analysis?

  • It cannot be easily quantified for statistical analysis
  • It may be influenced by researcher bias
  • It requires substantial resources and time for data collection
  • It's always better than quantitative data
Qualitative data has several limitations in data analysis. Firstly, it cannot be easily quantified for statistical analysis which limits its utility in certain research settings. Secondly, collecting and analyzing qualitative data often requires substantial resources and time, which can be a challenge for large-scale studies. Lastly, qualitative data may be influenced by researcher bias, particularly during data collection and interpretation.

What is the assumption of normality in residual analysis?

  • The coefficients of the regression line are normally distributed
  • The dependent variable is normally distributed
  • The independent variables are normally distributed
  • The residuals are normally distributed
The assumption of normality in residual analysis states that if we draw a large number of samples and create a distribution of the sample means, this distribution will be well approximated by a normal distribution. This is necessary to make inferences about the regression coefficients and to calculate prediction intervals.

How does the Wilcoxon Signed Rank Test deal with zeros in the difference of paired observations?

  • Zeros are averaged
  • Zeros are counted as half a sign
  • Zeros are discarded
  • Zeros are included
In the Wilcoxon Signed Rank Test, zeros in the difference of paired observations are typically discarded.