What kind of data is best suited for the Wilcoxon Signed Rank Test?
- Both Continuous and Ordinal data
- Continuous data
- Nominal data
- Ordinal data
The Wilcoxon Signed Rank Test is best suited for continuous and ordinal data. It is a non-parametric test that can handle both types of data.
What is the relationship between a cumulative distribution function and a probability density function?
- The cumulative distribution function is the integral of the probability density function
- The probability density function is the integral of the cumulative distribution function
- There is no relationship between them
- They are the same thing
The cumulative distribution function (CDF) and the probability density function (PDF) are closely related. For a continuous random variable, the CDF is the integral of the PDF. This means that the PDF is the derivative of the CDF.
How do you decide on the number of Principal Components to retain during PCA?
- All of the above
- By calculating the cumulative explained variance
- By checking the eigenvalues
- By using the elbow method
The number of principal components to retain can be decided in several ways: checking the eigenvalues (typically, components with eigenvalues greater than 1 are retained), using the elbow method (looking for a clear "elbow" in the scree plot), or calculating the cumulative explained variance (often, enough components to explain at least 95% of the variance are retained).
How does the choice of significance level affect the probability of making a Type I error?
- Higher significance level leads to higher probability of Type I error
- Lower significance level leads to higher probability of Type I error
- Significance level has no effect on the probability of Type I error
- The choice of significance level affects the probability of Type II error, not Type I
The significance level (alpha) is the probability of making a Type I error. So, a higher significance level increases the chance of rejecting the null hypothesis when it's true, hence increasing the probability of a Type I error.
What can be a potential drawback of using a high degree polynomial in regression analysis?
- It can lead to overfitting
- It can lead to underfitting
- It doesn't capture relationships between variables
- It simplifies the model too much
Using a high degree polynomial in regression analysis can lead to overfitting. Overfitting occurs when a model captures not only the underlying pattern but also the noise in the data, making it perform well on the training data but poorly on new, unseen data.
Is the Kruskal-Wallis Test used for comparing two groups or more than two groups?
- Both
- More than two groups
- Neither
- Two groups
The Kruskal-Wallis Test is used for comparing more than two groups.
When should you use the Spearman’s Rank Correlation test?
- When data is normally distributed
- When data is ordinal or not normally distributed
- When data is perfectly ranked
- When the correlation is linear
The Spearman’s Rank Correlation test should be used when data is ordinal or not normally distributed. It is a non-parametric test that does not require the assumption of normal distribution.
How does the sample size impact the result of a Z-test?
- Larger sample sizes can produce more precise estimates, reducing the standard error
- Larger sample sizes increase the likelihood of a Type I error
- Sample size has no impact on the results of a Z-test
- nan
Larger sample sizes generally allow for more precise estimates of population parameters. This reduces the standard error, making the z-score larger and potentially leading to stronger evidence against the null hypothesis in a Z-test.
What is the alternative hypothesis in the context of statistical testing?
- A condition of no effect or no difference
- A specific outcome of the experiment
- An effect or difference exists
- The sample size is large enough for the test
The alternative hypothesis is the hypothesis used in hypothesis testing that is contrary to the null hypothesis. It is usually taken to be that the observations are the result of a real effect.
How do bias and variability affect sampling methods?
- Bias and variability always increase the accuracy of estimates
- Bias and variability are unrelated concepts in statistics
- Bias increases the spread of a data distribution, and variability leads to consistent errors
- Bias leads to consistent errors in one direction, and variability refers to the spread of a data distribution
Bias and variability are two key concepts in sampling methods. Bias refers to consistent, systematic errors that lead to an overestimate or underestimate of the true population parameter. Variability refers to the spread or dispersion of a data distribution, or in this context, the sampling distribution. Lower bias and lower variability are generally desirable to increase the accuracy and precision of estimates.