The sum of all probabilities in a discrete probability distribution is always ________.
- 0
- 1
- Negative
- Variable
For a discrete random variable, the sum of all probabilities must equal to 1. This is because it represents a complete enumeration of all possible outcomes of the random variable, which together encompass all possibilities.
What does the 'power of a test' signify in hypothesis testing?
- The probability of correctly rejecting a false null hypothesis
- The probability of incorrectly accepting a true null hypothesis
- The probability of making a Type I error
- The probability of making a Type II error
The power of a statistical test is the probability that it correctly rejects a false null hypothesis. In other words, it is 1 minus the probability of making a Type II error.
What is the shape of a normal distribution?
- Skewed to the left
- Skewed to the right
- Symmetrical bell curve
- Uniform flat shape
The normal distribution, also known as Gaussian distribution, is a continuous probability distribution that has a bell-shaped curve. It is symmetrical around its mean, implying that the data near the mean are more frequent in occurrence than data far from the mean.
Why might you use a non-parametric test over a parametric one?
- The data does not meet the assumptions for a parametric test
- The data follows a normal distribution
- The data has no outliers
- The data set is very large
Non-parametric tests might be used over parametric ones when the data does not meet the assumptions for a parametric test, such as when the data does not follow a normal distribution, when the variances are not equal across groups, or when the data are ordinal or nominal rather than interval or ratio.
Why might a non-parametric test be used instead of a t-test?
- All of the above
- When the data is not normally distributed
- When the population standard deviation is known
- When the sample size is very large
Non-parametric tests are used when the data doesn't meet the assumptions of parametric tests like the t-test, such as when the data is not normally distributed.
If the p-value in a Chi-square test is less than the significance level, we ________ the null hypothesis.
- accept
- ignore
- question
- reject
If the p-value in a Chi-square test is less than the chosen significance level, we reject the null hypothesis. This means that we have enough evidence to conclude that the two variables are not independent.
In the context of multiple linear regression, __________ refers to the phenomenon where the coefficients estimate becomes highly sensitive to changes in the model.
- Autocorrelation
- Heteroscedasticity
- Multicollinearity
- Overfitting
Multicollinearity refers to the situation in multiple linear regression where the predictor variables are highly correlated. This can lead to unstable estimates of the coefficients which can change erratically in response to small changes in the model.
How can multicollinearity be addressed in multiple regression analysis?
- By adding more variables to the model.
- By increasing the sample size.
- By removing one or more of the correlated variables.
- Multicollinearity cannot be addressed.
Multicollinearity can be addressed by removing one or more of the highly correlated independent variables.
Bayes' theorem is a fundamental principle underlying ________ learning.
- active
- machine
- passive
- rote
Bayesian methods, which are grounded in Bayes' theorem, play an integral role in many areas of machine learning. They allow the model to update its predictions as it receives more data, making them particularly useful for tasks involving prediction and recommendation.
What is the purpose of an F-test in multiple linear regression?
- To check for multicollinearity
- To check the linearity of the model
- To check the normality of residuals
- To check the overall significance of the model
The F-test in multiple linear regression is used to test the overall significance of the model, essentially testing whether at least one of the predictors' coefficients is non-zero and hence contributes to explaining the variability in the response variable.