The range of a dataset is sensitive to _______.
- Mean
- Median
- Mode
- Outliers
The range of a dataset is sensitive to outliers. Because the range is calculated as the difference between the maximum and minimum values, an outlier (an extremely high or low value) can greatly increase the range.
In a ________ distribution, the events occur with a known constant mean rate and independently of the time since the last event.
- Binomial
- Normal
- Poisson
- Uniform
The Poisson distribution models the number of events happening in a fixed interval of time or space, given a constant mean rate of occurrence and independence of the time since the last event.
How is the probability of the complement of an event A calculated?
- 1 - P(A)
- P(A) * P(A')
- P(A) + P(A')
- P(A) - P(A')
The probability of the complement of an event A, denoted as P(A') or P(not A), is calculated as 1 - P(A). This is because an event and its complement are mutually exclusive and exhaustive, meaning either the event occurs or it does not.
What implications does an insignificant F-test have in the context of multiple linear regression?
- The model does not explain a significant amount of the variance in the response
- The model explains a significant amount of the variance in the response
- The model has a high R-squared value
- The model has violated the assumption of homoscedasticity
The F-test in multiple linear regression tests the null hypothesis that all regression coefficients are equal to zero. An insignificant F-test suggests that the predictors do not explain a significant amount of the variance in the response variable.
What happens when the assumptions about residuals in linear regression are violated?
- The interpretation of the model changes
- The model becomes invalid
- The model becomes underfit
- The standard errors, confidence intervals, and hypothesis tests may not be valid
Violations of the assumptions about residuals in linear regression can lead to inefficient and biased estimates, and standard errors, confidence intervals, and hypothesis tests may not be valid. This can lead to incorrect inferences and predictions.
What is the main purpose of simple linear regression?
- To find the average of the data
- To identify outliers
- To understand the relationship between two variables
- To visualize the data
The main purpose of simple linear regression is to understand the relationship between two variables. It provides a quantitative estimate of the relationship between one dependent variable and one independent variable.
In what situations is the coefficient of variation a better measure of dispersion than the standard deviation?
- When data sets have different units
- When data sets have the same units
- When the data set is normally distributed
- When the mean of the data set is zero
The coefficient of variation (CV) is a standardized measure of dispersion that is unitless. It's particularly useful when comparing the dispersion of two or more datasets that have different units or significantly different means. Standard deviation, on the other hand, has the same units as the data, which may not be helpful for comparisons across different datasets.
Under what circumstances can the conditional probability of an event be equal to its marginal probability?
- When the event is certain
- When the event is dependent on all other events
- When the event is impossible
- When the event is independent of all other events
The conditional probability of an event A given an event B equals the marginal probability of A when A and B are independent. This is because the occurrence of B does not change the probability of A if they are independent.
What type of data is the Mann-Whitney U test used for?
- Interval data
- Nominal data
- Ordinal data
- Ratio data
The Mann-Whitney U test is used for ordinal data, which can be ranked but have unknown or non-equivalent differences between values. It can also be used with interval and ratio data that do not meet the assumptions of other tests.
What does the 'mode' refer to in a data set?
- The average value
- The middle value
- The most frequently occurring value
- The range of values
The mode in a data set refers to the most frequently occurring value. In a dataset, the mode is the value that appears the most number of times. A dataset may have one mode (unimodal), two modes (bimodal), or multiple modes (multimodal).