In the context of simple linear regression, the difference between the observed value and the predicted value is referred to as the ________.
- correlation coefficient
- dependent variable
- error term
- independent variable
The error term, or residual, in a regression model is the difference between the observed value and the predicted value. It represents the portion of the dependent variable that cannot be explained by the independent variable(s).
Which common statistical test is considered non-parametric?
- ANOVA
- Chi-Square Test
- Linear Regression
- t-test
The Chi-Square Test is a common statistical test that is considered non-parametric. This test is often used to analyze categorical data and does not require assumptions about the population distribution.
Which measure of dispersion considers all the data points in a dataset?
- Interquartile range
- Mode
- Range
- Variance
Variance is a measure of dispersion that considers all data points in the dataset. It is calculated by taking the average of the squared differences from the mean.
The ________ distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space.
- Exponential
- Gaussian
- Poisson
- Uniform
The Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.
A 95% confidence interval means that if the same sampling method is repeated many times, then ________% of the confidence intervals will contain the true population parameter.
- 50
- 75
- 90
- 95
A 95% confidence interval means that if we were to take a large number of samples and calculate the confidence interval for each sample, we would expect the true population parameter to fall within the interval 95% of the time.
What is the significance of 'distance measures' in cluster analysis?
- Distance measures determine the similarities or differences between data points
- Distance measures help in determining the shape of the clusters
- Distance measures help in visualizing the clusters
- Distance measures indicate the number of clusters
Distance measures, like Euclidean distance or Manhattan distance, play a crucial role in cluster analysis. They determine the similarities or differences between data points. They influence how the clusters will be formed, as the most similar or closest data points get clustered together.
In what situations is Spearman's rank correlation preferred over Pearson's correlation?
- All of the above
- When the data contains outliers
- When the relationship between variables is nonlinear
- When the variables are not normally distributed
Spearman's rank correlation coefficient is a nonparametric measure of rank correlation. It's preferred over Pearson's correlation when the variables are not normally distributed, the relationship is nonlinear, or the data contains outliers. It assesses how well an arbitrary monotonic function could describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables.
A __________ is the difference between the observed value and the predicted value of the response variable in regression analysis.
- Coefficient
- Error term
- Outlier
- Residual
In the context of regression analysis, the difference between the observed value and the predicted value of the response variable is called a "residual".
What is a residual in the context of simple linear regression?
- The difference between the observed and predicted values
- The difference between the predicted and observed values of the independent variable
- The error in the slope of the regression line
- The observed value of the dependent variable
A residual is the difference between the observed value of the dependent variable (y) and the predicted value (ŷ), given by the regression model. It represents the error of the estimate.
How do changes in the scale of measurement affect the correlation coefficient?
- They decrease the correlation coefficient
- They do not affect the correlation coefficient
- They increase the correlation coefficient
- They reverse the sign of the correlation coefficient
The correlation coefficient is not affected by changes in the center (mean) or scale (standard deviation) of the variables. This is because correlation measures the strength of a relationship between variables relative to their variability. It's a dimensionless quantity, so changes in the scale of measurements of the variables do not change it.