How does 'DBSCAN' clustering differ from 'K-means' and 'hierarchical' clustering?
- DBSCAN can find arbitrarily shaped clusters and is less affected by outliers
- DBSCAN creates a hierarchy of clusters
- DBSCAN requires the number of clusters to be specified
- DBSCAN uses centroid to form the clusters
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) differs from K-means and hierarchical clustering in that it can find arbitrarily shaped clusters, and it's less affected by outliers. It does not require the user to set the number of clusters a priori, but instead, it infers the number of clusters based on the data.
In Bayes' theorem, what is the posterior probability?
- The likelihood of the evidence
- The probability of an event before evidence is observed
- The probability of the evidence given the event
- The updated probability of an event after evidence is observed
In Bayes' Theorem, the posterior probability is the updated probability of an event after new evidence has been observed. It is calculated by multiplying the likelihood and the prior probability and then dividing by the probability of the evidence.
The range of values around the point estimate that captures the true population parameter at some predetermined confidence level is called a ________ interval.
- Confidence
- Correlation
- Deviation
- Variable
The range of values around the point estimate that captures the true population parameter at some predetermined confidence level is called a confidence interval. Confidence intervals are used in statistics to indicate the reliability of an estimate.
To prevent overfitting, we can apply a technique called ________ in polynomial regression.
- Aggregation
- Factorization
- Normalization
- Regularization
To prevent overfitting, we can apply a technique called regularization in polynomial regression. Regularization involves adding a penalty term to the loss function during the process of training a model. This penalty term discourages the coefficients of the model from reaching large values, leading to a simpler model that's less likely to overfit.
What is a uniform distribution?
- A bell-shaped distribution
- A distribution with different probabilities for different outcomes
- A distribution with the same probability for all outcomes
- A skewed distribution
A uniform distribution, also called a rectangular distribution, is a type of probability distribution in which all outcomes are equally likely. Each interval of equal length on the distribution's support has the same probability.
The geometric mean is particularly useful when comparing different items with very different ________.
- Mean values
- Median values
- Mode values
- Ranges
The geometric mean is particularly useful when comparing different items with very different ranges. It is used in various kinds of growth rates, like population growth or financial growth, where each year's value is relative to the previous year's value.
How would you interpret the result of a Kruskal-Wallis Test?
- As a measure of correlation
- As a measure of dependence
- As a measure of difference between groups
- As a measure of variance
The result of a Kruskal-Wallis Test is interpreted as a measure of difference between groups. If the test is significant, it suggests that at least one of the groups differs from the others.
When two or more predictors in a multiple linear regression model are highly correlated, it is known as __________.
- Autocorrelation
- Homoscedasticity
- Multicollinearity
- Overfitting
Multicollinearity is a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. This can lead to unstable estimates of the coefficients.
In the presence of multicollinearity, the estimated regression coefficients are _______.
- biased
- equal to zero
- negative
- unbiased
Even in the presence of multicollinearity, the least squares estimates of the regression coefficients are still unbiased. However, they are less precise and have high standard errors.
How does standard deviation differ from the mean absolute deviation?
- Mean absolute deviation is always greater
- Standard deviation is always greater
- Standard deviation squares the deviations while mean absolute deviation takes absolute values
- They are the same
The standard deviation and mean absolute deviation both measure the dispersion in a dataset. The key difference lies in how they treat deviations from the mean: standard deviation squares the deviations before averaging them, while mean absolute deviation takes the absolute value of deviations before averaging. As a result, standard deviation is more sensitive to extreme values than the mean absolute deviation.