In a situation where the data is densely packed in some regions and sparse in others, how would the choice of K and distance metric influence the results, and what would be the best approach?
- Choose a fixed K and Euclidean distance
- Choose a large K and any distance metric
- Choose a small K and ignore distance metric
- Choose an appropriate K and distance metric, considering data distribution
Considering the data distribution and choosing an appropriate value of K and distance metric can help address the issue of varying data density in KNN.
Why is centroid initialization important in K-Means clustering?
- All of the Above
- It determines the final clusters
- It prevents overfitting
- It speeds up the convergence process
Centroid initialization is important in K-Means as it can significantly affect the final clusters. Poor initialization can lead to suboptimal clusters or slow convergence.
Can you name a popular clustering algorithm used in Machine Learning?
- Decision Trees
- K-Means
- K-Nearest Neighbors
- Linear Regression
K-Means is a widely-used clustering algorithm that partitions data into K distinct, non-overlapping clusters based on similarity.
Logistic Regression is commonly used for __________ problems where the outcome has two categories.
- Binary classification
- Clustering
- Multiclass classification
- Regression
Logistic Regression is primarily used for binary classification problems where the outcome has only two categories.
Centering variables in Multiple Linear Regression helps to reduce the ___________ and ease the interpretation of interaction effects.
- complexity
- mean
- multicollinearity
- variance
Centering variables (subtracting the mean) helps to reduce multicollinearity, especially when interaction effects are included. This eases the interpretation of the coefficients and reduces potential issues related to multicollinearity with interaction terms.
What role does model complexity play in overfitting?
- Has no effect on overfitting
- Increases the risk of overfitting
- Increases the risk of underfitting
- Reduces the risk of overfitting
Model complexity "increases the risk of overfitting." A more complex model can capture the noise in the training data, leading to poor generalization on unseen data.
Given a scenario where the feature correlation is very high, how would you choose between Ridge, Lasso, and ElasticNet?
- It doesn't matter
- Prefer ElasticNet
- Prefer Lasso
- Prefer Ridge
ElasticNet is preferred when there's multicollinearity, as it combines L1 and L2 penalties, balancing the properties of Ridge and Lasso.
In the context of Polynomial Regression, using too low a degree may lead to _________, while too high a degree may lead to _________.
- accuracy, inaccuracy
- overfitting, underfitting
- stability, instability
- underfitting, overfitting
Using too low a degree may cause the model to be too simple and underfit the data, while too high a degree can lead to a complex model that overfits the data.
Can classification be used to predict continuous values?
- No
- Only with specific algorithms
- Sometimes
- Yes
Classification is used to predict discrete categories or classes, not continuous values. Regression techniques are used for predicting continuous values.
In what situations would RMSE be a more appropriate metric than MAE?
- When larger errors are more critical to penalize
- When smaller errors are more critical to penalize
- When the model needs to be robust to outliers
- When the model requires a metric in squared units
RMSE can be more appropriate than MAE when larger errors are more critical to penalize. Since RMSE squares the errors before averaging them, it gives more weight to larger errors compared to MAE. This characteristic of RMSE can be more suitable in applications where large deviations from the actual values are considered more detrimental than smaller ones.