What is the relationship between the mean and the standard deviation in a normal distribution?
- The mean is always larger than the standard deviation
- The mean is the midpoint of the distribution, and the standard deviation measures the spread
- The standard deviation is always larger than the mean
- There is no relationship between the mean and the standard deviation
In a normal distribution, the mean is the center of the distribution and represents the "average" value. The standard deviation measures the dispersion around the mean. Roughly 68% of the data falls within one standard deviation of the mean in a normal distribution.
The conditional probability of A given B is denoted as ________.
- P(A + B)
- P(A / B)
- P(A B)
- P(A ∩ B)
The conditional probability of A given B is denoted as P(A
What is the primary goal of random sampling?
- To always select the same individuals
- To ensure that every member of the population has an equal chance of being selected
- To select individuals who are likely to give the desired results
- To select the individuals who are easiest to reach
The primary goal of random sampling is to ensure that every member of the population has an equal chance of being selected. This helps to reduce bias and increase the likelihood that the sample is representative of the population, which makes the results more valid and generalizable.
_______ regression is a method used to handle multicollinearity by adding a degree of bias to the regression estimates.
- Logistic
- Polynomial
- Ridge
- Simple linear
Ridge regression handles multicollinearity by introducing a degree of bias to the regression estimates, reducing their variance, and making them more reliable.
What are the assumptions made by the Spearman’s Rank Correlation test?
- The data is continuous and the relationship is monotonic
- The data is normally distributed and linear
- The data is ordinal and the relationship is linear
- The data is ordinal or continuous and the relationship is monotonic
The Spearman’s Rank Correlation test assumes that the variables are ordinal or continuous and that the relationship between them is monotonic. It does not require the relationship to be linear or the data to be normally distributed.
What are the limitations of using mean as a measure of central tendency?
- It can't be used with large data sets
- It can't be used with small data sets
- It is difficult to calculate
- It is highly sensitive to outliers
The main limitation of the mean as a measure of central tendency is that it is highly sensitive to outliers or extreme values. An outlier can skew the mean and make it a less accurate representation of the data. Moreover, mean does not describe the middle value or most common value in the dataset, which are often important characteristics.
What does a 95% confidence interval estimate?
- The mean of the sample
- The range within which 95% of the data points lie
- The standard deviation of the population
- The true population parameter with a 95% level of confidence
A 95% confidence interval estimates the range within which we are 95% confident that the true population parameter lies. It is not about the range of the data or the mean of the sample.
What is the error term in a simple linear regression model?
- It is the dependent variable
- It is the difference between the observed and predicted values
- It is the independent variable
- It is the slope of the regression line
The error term in a simple linear regression model is the difference between the observed and predicted values. It captures the variability in the dependent variable that is not explained by the independent variable in the model.
What can be inferred if the residuals are not randomly distributed in the residual plot?
- The data has no outliers
- The data is perfectly linear
- The linear regression model is a perfect fit for the data
- The linear regression model is not a good fit for the data
If the residuals are not randomly distributed (e.g., if they form a pattern), it suggests that the linear regression model is not a good fit for the data. This could be because the relationship between the variables is not linear, or because the data exhibits heteroscedasticity (unequal variances of errors), among other reasons.
What type of data is used in the Chi-square test for goodness of fit?
- Categorical data
- Continuous data
- Interval data
- Ordinal data
The Chi-square test for goodness of fit is used with categorical data. It compares the observed frequencies in each category with the frequencies we would expect to see if the data followed the theoretical distribution.