The _______ test compares the means of two independent groups.
- Chi-square
- Independent t
- Paired t
- Z
An Independent t-test (or two sample t-test) compares the means of two independent groups.
How does a higher R-squared value impact the inference in multiple linear regression?
- It decreases the number of observations
- It improves the interpretability of the model
- It increases the residuals
- It makes the model more complex
The R-squared value measures the proportion of the variance in the dependent variable that is predictable from the independent variables. A higher R-squared value, closer to 1, implies a higher proportion of variability in the response variable is explained by the predictors, improving the model's interpretability and predictive power.
In multiple linear regression, the __________ test is used to test if a group of variables contributes to the prediction of the response.
- Chi-square test
- F-test
- T-test
- Z-test
The F-test is used in multiple regression to test whether at least one of the predictors' regression coefficient is not equal to zero. In other words, it tests whether the predictors are significant in explaining the response variable.
How does the sample size relate to the power of a test?
- It depends on the effect size
- Larger sample sizes decrease power
- Larger sample sizes increase power
- Sample size has no influence on power
Larger sample sizes increase the power of a test because they provide more data, reducing the influence of random error and making it easier to detect an effect if one exists. This is why researchers often aim to recruit as large a sample as possible, within the constraints of their resources.
If two events A and B are mutually exclusive, the probability of both occurring is _______.
- 0
- 0.5
- 1
- The probability is undefined
If two events A and B are mutually exclusive, the probability of both occurring is 0. Mutually exclusive events cannot occur at the same time.
How does the Law of Large Numbers impact the calculation of probabilities?
- It changes the probability of an event based on previous outcomes.
- It doesn't affect the calculation of probabilities.
- It guarantees that the experimental probability gets closer to the theoretical probability as the number of trials increases.
- It states that all probabilities must be equal.
The Law of Large Numbers impacts the calculation of probabilities by asserting that as the number of trials (or observations) increases, the experimental probabilities will get closer and closer to the theoretical (or true) probabilities. It gives validity to the notion of probability in practical applications.
The Sign Test is based on the direction of the _________ between pairs.
- differences
- medians
- ranks
- signs
The Sign Test is based on the direction of the differences between pairs.
When would you prefer to use the median instead of the mean as a measure of central tendency?
- When the data has outliers
- When the data is in large quantity
- When the data is normally distributed
- When the data is uniformly distributed
The median is preferred over the mean when our data is skewed or has outliers. Outliers can greatly affect the mean and create a distorted view of the data, but the median is not affected by outliers or skewed data. The median is the middle score for a set of data that has been arranged in order of magnitude, making it a better measure when dealing with skewed distributions.
Why is the assumption of independently and identically distributed (IID) residuals important in linear regression?
- It ensures that the model is not overfitting
- It ensures that the model is not underfitting
- It ensures that the parameter estimates are unbiased
- It ensures the correctness of standard errors and hypothesis tests
The assumption of IID residuals is important because it ensures that standard errors, confidence intervals, and hypothesis tests are valid. If this assumption is violated, these statistics may be incorrect, leading to misleading results.
What is the purpose of the 'whiskers' in a box plot?
- To represent the outliers
- To represent the range of the data
- To show the interquartile range
- To show the mean and median
The 'whiskers' in a box plot represent the range of the data. The upper whisker extends to the maximum data value or up to 1.5 times the interquartile range (IQR), while the lower whisker extends to the minimum data value or up to 1.5 times the IQR. Any data points beyond the whiskers can be considered outliers.