How does effect size impact hypothesis testing?

  • Effect size has no impact on hypothesis testing
  • Larger effect sizes always lead to rejection of the null hypothesis
  • Larger effect sizes always lead to smaller p-values
  • Larger effect sizes increase the statistical power of the test
Effect size measures the magnitude of the difference or the strength of the relationship in the population. A larger effect size means a larger difference or stronger relationship, which in turn increases the statistical power of the test. Power is the probability that the test correctly rejects the null hypothesis when the alternative is true.

How does a binomial distribution differ from a normal distribution?

  • Binomial distribution is continuous, while normal is discrete
  • Both are continuous distributions
  • Both are discrete distributions
  • Normal distribution is continuous, while binomial is discrete
A binomial distribution is discrete, meaning it only takes on integer values on a countable range, and it represents the number of successes in a fixed number of independent Bernoulli trials with a given success probability. A normal distribution is continuous, and it is often used as a first approximation to the binomial distribution, when the number of trials is large.

What is the underlying assumption of linearity in a multiple linear regression model?

  • All independent variables must have a linear relationship with the dependent variable
  • All residuals must be equal
  • All variables must be continuous
  • All variables must be normally distributed
The assumption of linearity in a multiple linear regression model assumes that the relationship between each independent variable and the dependent variable is linear. This implies that the change in the dependent variable due to a one-unit change in the independent variable is constant, regardless of the value of the independent variable.

The term '________' refers to the sharpness of the peak of a frequency-distribution curve.

  • Kurtosis
  • Median
  • Mode
  • Skewness
Kurtosis refers to the sharpness of the peak of a frequency-distribution curve. It measures the tails and sharpness of the distribution. Distributions with large kurtosis exhibit tail data exceeding the tails of the normal distribution.

What is the difference between a parameter and a statistic in the field of statistics?

  • A parameter and a statistic are the same thing
  • A parameter is based on a sample; a statistic is based on the population
  • A statistic is a numerical measure; a parameter is a graphical representation
  • A statistic is based on a sample; a parameter is based on the population
In the field of statistics, a parameter is a numerical characteristic of a population, whereas a statistic is a numerical characteristic of a sample. Parameters are often unknown because we cannot examine the entire population. We use statistics, which we compute from sample data, to estimate parameters.

How does adding more predictors to a multiple linear regression model affect its inferences?

  • It always improves the model
  • It always makes the model worse
  • It can lead to overfitting
  • It has no effect on the model
Adding more predictors to a model may increase the R-squared value, making it appear that the model is improving. However, if these additional predictors are not truly associated with the response variable, it may result in overfitting, making the model perform poorly on new, unseen data.

How does ridge regression help in dealing with multicollinearity?

  • By eliminating the correlated variables.
  • By increasing the sample size.
  • By introducing a penalty term to shrink the coefficients.
  • By transforming the variables.
Ridge regression introduces a regularization term (penalty term) into the loss function which helps to shrink the coefficients towards zero and mitigate the effect of multicollinearity.

Which mathematical concept is at the core of PCA?

  • Differentiation
  • Eigenvalues and Eigenvectors
  • Integration
  • Matrix Multiplication
PCA relies heavily on the concepts of Eigenvalues and Eigenvectors. These allow it to determine the axes along which the data has the most variance, which are used to form the new variables (principal components).

___________ refers to the condition where the variance of the errors or residuals is constant across all levels of the explanatory variables.

  • Autocorrelation
  • Heteroscedasticity
  • Homoscedasticity
  • Multicollinearity
Homoscedasticity is the condition in which the variance of the errors or residuals is constant across all levels of the explanatory variables. It is one of the key assumptions of linear regression.

In a multiple linear regression equation, the ________ represents the expected change in the dependent variable for a one-unit change in the corresponding independent variable, holding all other independent variables constant.

  • F-statistic
  • R-squared value
  • regression coefficient
  • residual
In a multiple linear regression equation, the regression coefficient represents the expected change in the dependent variable for a one-unit change in the corresponding independent variable, while holding all other independent variables constant. It gives the direction and strength of the relationship between the dependent variable and each independent variable.