In multiple linear regression, ________ is used to test the overall significance of the model.
- the Chi-square statistic
- the F-statistic
- the Z-statistic
- the t-statistic
In multiple linear regression, the F-statistic is used to test the overall significance of the model. This test checks the null hypothesis that all regression coefficients are zero against the alternative that at least one of them is not zero. If the F-statistic is significantly large and the corresponding p-value is small, we reject the null hypothesis, concluding that the regression model has some validity in predicting the outcome variable.
The ________ is used to fit the regression line in a simple linear regression model.
- least squares method
- mean
- median
- mode
The least squares method is used to find the best-fitting line through the data points. This is done by minimizing the sum of the squares of the vertical distances of the points from the line.
When is it appropriate to use a binomial distribution?
- When each trial in an experiment has exactly two possible outcomes
- When the data is continuous
- When the outcomes are not independent
- When the probability of success changes with each trial
A binomial distribution is appropriate when conducting an experiment where each trial has exactly two possible outcomes (often termed success and failure), the trials are independent, and the probability of success is constant across trials.
What are the dependent and independent variables in simple linear regression?
- Both variables are dependent
- Both variables are independent
- The dependent variable is the outcome we are trying to predict, and the independent variable is the predictor
- The dependent variable is the predictor, and the independent variable is the outcome we are trying to predict
In simple linear regression, the dependent variable is the outcome we are trying to predict, and the independent variable is the predictor. The dependent variable is also known as the response or target variable, and the independent variable is also known as the explanatory or feature variable.
What is the difference between a discrete and a continuous probability distribution?
- Discrete distributions are always normal; continuous distributions are always uniform
- Discrete distributions are for qualitative data; continuous distributions are for quantitative data
- Discrete distributions involve countable outcomes; continuous distributions involve uncountable outcomes
- There is no difference
Discrete probability distributions are used when the outcomes are countable or discrete. Examples include the number of heads when flipping coins or the number of defective items in a batch. Continuous probability distributions are used when outcomes are uncountably infinite, typically involving measurements. Examples include the height of individuals or the time it takes to run a mile.
The Mann-Whitney U test is primarily used for comparing ________ distributions.
- binomial
- dependent
- independent
- normal
The Mann-Whitney U test is used for comparing independent distributions, particularly to determine whether two independent samples were drawn from a population with the same distribution.
What is a random variable in probability theory?
- A factor that doesn't change
- A variable that can take on different values, each with an associated probability
- An unknown variable
- An unpredictable factor
A random variable in probability theory is a variable that can take on different values, each with an associated probability. It's not "random" in the everyday sense of the word, but its exact value is uncertain until it's observed.
A _______ t-test is used to compare two related samples or repeated measurements on a single sample.
- Independent
- One-sample
- Paired
- Two-sample
A Paired t-test is used to compare two related samples or repeated measurements on a single sample. It's often used in before-and-after scenarios where the same individuals are measured twice.
How does the standard deviation affect the shape of a normal distribution?
- Changes the kurtosis
- Changes the skewness
- Changes the spread or dispersion
- Does not affect the shape
The standard deviation, a measure of dispersion or spread, determines the width of a normal distribution. A larger standard deviation results in a wider, flatter distribution, while a smaller standard deviation results in a narrower, steeper distribution.
How can transformations help in reducing skewness in a dataset?
- They can make the distribution more symmetric
- They can shift the mean towards the skew
- They can shift the mode towards the skew
- Transformations cannot reduce skewness
Transformations, such as logarithmic or square root transformations, can help in reducing skewness by making the distribution more symmetric. The choice of transformation often depends on the degree and direction of skewness.
How do you diagnose multicollinearity in a multiple linear regression model?
- By calculating the R-squared value
- By checking the correlation matrix and Variance Inflation Factor (VIF)
- By looking at the residual plot
- By looking at the scatter plot
Multicollinearity is diagnosed in a multiple linear regression model by checking the correlation matrix and the Variance Inflation Factor (VIF). A high correlation between independent variables and a VIF greater than 5 or 10 suggests the presence of multicollinearity.
In a normal distribution, about 95% of the data lies within _______ standard deviations of the mean.
- Four
- One
- Three
- Two
According to the empirical rule (also known as the 68-95-99.7 rule), in a normal distribution, about 68% of the data lies within one standard deviation of the mean, about 95% lies within two standard deviations, and about 99.7% lies within three standard deviations.