What are the dependent and independent variables in simple linear regression?

  • Both variables are dependent
  • Both variables are independent
  • The dependent variable is the outcome we are trying to predict, and the independent variable is the predictor
  • The dependent variable is the predictor, and the independent variable is the outcome we are trying to predict
In simple linear regression, the dependent variable is the outcome we are trying to predict, and the independent variable is the predictor. The dependent variable is also known as the response or target variable, and the independent variable is also known as the explanatory or feature variable.

When is it appropriate to use a binomial distribution?

  • When each trial in an experiment has exactly two possible outcomes
  • When the data is continuous
  • When the outcomes are not independent
  • When the probability of success changes with each trial
A binomial distribution is appropriate when conducting an experiment where each trial has exactly two possible outcomes (often termed success and failure), the trials are independent, and the probability of success is constant across trials.

The ________ is used to fit the regression line in a simple linear regression model.

  • least squares method
  • mean
  • median
  • mode
The least squares method is used to find the best-fitting line through the data points. This is done by minimizing the sum of the squares of the vertical distances of the points from the line.

In multiple linear regression, ________ is used to test the overall significance of the model.

  • the Chi-square statistic
  • the F-statistic
  • the Z-statistic
  • the t-statistic
In multiple linear regression, the F-statistic is used to test the overall significance of the model. This test checks the null hypothesis that all regression coefficients are zero against the alternative that at least one of them is not zero. If the F-statistic is significantly large and the corresponding p-value is small, we reject the null hypothesis, concluding that the regression model has some validity in predicting the outcome variable.

What is the primary purpose of conducting an ANOVA test?

  • To calculate the standard deviation of a dataset
  • To determine the mode of a set of data
  • To find the correlation between two variables
  • To test the equality of means among groups
The primary purpose of an ANOVA test is to compare the means of different groups and determine whether any of those means are significantly different from each other.

How is the confidence interval for a proportion calculated?

  • nan
  • p ± (z*√(p(1-p)/n))
  • p ± z*(s/√n)
  • p ± z*(σ/√n)
The confidence interval for a proportion is calculated using the formula: p ± (z*√(p(1-p)/n)), where p is the sample proportion, z is the z-score associated with the desired confidence level, and n is the sample size.

If the null hypothesis is true in ANOVA, the F-statistic follows a ________ distribution.

  • Binomial
  • Chi-Square
  • F
  • Normal
In ANOVA, if the null hypothesis is true, the F-statistic follows an F-distribution. The F-distribution is a probability distribution that is used most commonly in Analysis of Variance.

What does a Pearson Correlation Coefficient of 0 indicate?

  • No correlation
  • Perfect negative correlation
  • Perfect positive correlation
  • Weak positive correlation
A Pearson correlation coefficient of 0 indicates no correlation. This means that the variables are independent and there is no linear relationship between them.

In a normal distribution, about 95% of the data lies within _______ standard deviations of the mean.

  • Four
  • One
  • Three
  • Two
According to the empirical rule (also known as the 68-95-99.7 rule), in a normal distribution, about 68% of the data lies within one standard deviation of the mean, about 95% lies within two standard deviations, and about 99.7% lies within three standard deviations.

How do you diagnose multicollinearity in a multiple linear regression model?

  • By calculating the R-squared value
  • By checking the correlation matrix and Variance Inflation Factor (VIF)
  • By looking at the residual plot
  • By looking at the scatter plot
Multicollinearity is diagnosed in a multiple linear regression model by checking the correlation matrix and the Variance Inflation Factor (VIF). A high correlation between independent variables and a VIF greater than 5 or 10 suggests the presence of multicollinearity.