The range of values around the point estimate that captures the true population parameter at some predetermined confidence level is called a ________ interval.
- Confidence
- Correlation
- Deviation
- Variable
The range of values around the point estimate that captures the true population parameter at some predetermined confidence level is called a confidence interval. Confidence intervals are used in statistics to indicate the reliability of an estimate.
In Bayes' theorem, what is the posterior probability?
- The likelihood of the evidence
- The probability of an event before evidence is observed
- The probability of the evidence given the event
- The updated probability of an event after evidence is observed
In Bayes' Theorem, the posterior probability is the updated probability of an event after new evidence has been observed. It is calculated by multiplying the likelihood and the prior probability and then dividing by the probability of the evidence.
How does 'DBSCAN' clustering differ from 'K-means' and 'hierarchical' clustering?
- DBSCAN can find arbitrarily shaped clusters and is less affected by outliers
- DBSCAN creates a hierarchy of clusters
- DBSCAN requires the number of clusters to be specified
- DBSCAN uses centroid to form the clusters
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) differs from K-means and hierarchical clustering in that it can find arbitrarily shaped clusters, and it's less affected by outliers. It does not require the user to set the number of clusters a priori, but instead, it infers the number of clusters based on the data.
If the results of an ANOVA test are significant, ________ tests are often used to identify specifically which groups' means are different.
- Interaction
- Post-hoc
- Pre-hoc
- Tukey
If the results of an ANOVA test are significant, post-hoc tests are often used to identify specifically which groups' means are different. These tests are performed after the ANOVA and help to avoid type I errors when making multiple comparisons.
What is the primary objective of cluster analysis?
- To classify variables into different groups
- To group similar instances into clusters
- To predict the output variable
- To visualize high-dimensional data
The primary objective of cluster analysis is to group similar instances (observations, data points, etc.) into clusters.
In ANOVA, if the F statistic is significantly high, it suggests that the null ________ should be rejected.
- Distribution
- Hypothesis
- Model
- Theory
If the F statistic in an ANOVA is significantly high, it suggests that the null hypothesis should be rejected. The null hypothesis in ANOVA is typically that all group means are equal.
If the population standard deviation is unknown, we use the sample standard deviation to estimate the ________ of the mean.
- Confidence interval
- Range
- Standard error
- Variability
If the population standard deviation is unknown, the sample standard deviation is used to estimate the standard error of the mean. The standard error is a measure of how much the sample mean is expected to vary from the true population mean.
What is the purpose of Pearson's Correlation Coefficient?
- To compute the standard deviation of a dataset
- To determine the linear relationship between two variables
- To find the mean of a set of values
- To transform qualitative data into quantitative data
Pearson's correlation coefficient (denoted as r) is a measure of the strength and direction of association that exists between two continuous variables. It measures the degree to which pairs of data for these two variables lie on a line. The values lie between -1 and 1, where 1 indicates a perfect positive correlation, -1 a perfect negative correlation, and 0 no correlation at all.
Quantitative data represents quantities and can be measured on a ________ scale.
- Categorical
- Nominal
- Numerical
- Ordinal
Quantitative data represents quantities and can be measured on a Numerical scale. It includes both discrete data (e.g., the number of students in a class) and continuous data (e.g., the weight of a person).
How does standard deviation differ from the mean absolute deviation?
- Mean absolute deviation is always greater
- Standard deviation is always greater
- Standard deviation squares the deviations while mean absolute deviation takes absolute values
- They are the same
The standard deviation and mean absolute deviation both measure the dispersion in a dataset. The key difference lies in how they treat deviations from the mean: standard deviation squares the deviations before averaging them, while mean absolute deviation takes the absolute value of deviations before averaging. As a result, standard deviation is more sensitive to extreme values than the mean absolute deviation.
In the presence of multicollinearity, the estimated regression coefficients are _______.
- biased
- equal to zero
- negative
- unbiased
Even in the presence of multicollinearity, the least squares estimates of the regression coefficients are still unbiased. However, they are less precise and have high standard errors.
When two or more predictors in a multiple linear regression model are highly correlated, it is known as __________.
- Autocorrelation
- Homoscedasticity
- Multicollinearity
- Overfitting
Multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. This can lead to unstable estimates of the coefficients.