Increasing the regularization parameter in Ridge regression will ________ the coefficients but will not set them to zero.
- Decrease
- Increase
- Maintain
- nan
Increasing the regularization parameter in Ridge regression will shrink the coefficients towards zero but will not set them to zero, due to the L2 penalty.
Balancing the _________ in a training dataset is vital to ensure that the model does not become biased towards one particular outcome.
- classes
- features
- models
- parameters
Balancing the "classes" in a training dataset ensures that the model does not become biased towards one class, leading to a more accurate and fair representation of the data. This is especially crucial in classification tasks.
Overfitting in Polynomial Regression can be visualized by a graph where the polynomial curve fits even the _________ in the training data.
- accuracy
- linearity
- noise
- stability
A graph showing overfitting in Polynomial Regression will exhibit the polynomial curve fitting even the noise in the training data, not just the underlying trend.
In a case where you have a dataset with numerous outliers, which clustering algorithm would you choose and why?
- DBSCAN due to robustness to outliers
- DBSCAN due to sensitivity to noise
- K-Means due to robustness to noise
- K-Means due to sensitivity to outliers
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) would be suitable since it's robust to outliers. It can identify dense clusters and leave outliers as unclassified, making it effective in such scenarios.
How can you detect whether a model is overfitting or underfitting the data?
- By analyzing the training and validation errors
- By increasing model complexity
- By looking at the model's visualizations
- By reducing model complexity
Detecting overfitting or underfitting can be done "by analyzing the training and validation errors." Overfitting shows high training accuracy but low validation accuracy, while underfitting shows poor performance on both.
Explain the Bias-Variance tradeoff in the context of Cross-Validation.
- Increasing k decreases bias but may increase variance
- Increasing k decreases both bias and variance
- Increasing k increases bias but decreases variance
- Increasing k increases both bias and variance
The Bias-Variance tradeoff in the context of k-fold Cross-Validation refers to the balance between bias (error due to overly simplistic assumptions) and variance (error due to excessive complexity). Increasing k generally decreases bias since more data is used for training, but it may lead to an increase in variance as the validation set becomes more similar to the training set.
You're given a dataset with several features, some of which are highly correlated. How would you handle this using dimensionality reduction techniques?
- Applying K-Means Clustering
- Applying L1 Regularization
- Applying Principal Component Analysis (PCA)
- Applying Random Forest
Principal Component Analysis (PCA) would be used to handle high correlation among features. It reduces dimensionality by creating new uncorrelated variables that capture the variance present in the original features.
The addition of _________ in the loss function is a common technique to regularize the model and prevent overfitting.
- bias
- learning rate
- regularization terms
- weights
Regularization terms (like L1 or L2 penalties) in the loss function constrain the model, reducing the risk of overfitting by preventing large weights.
You have trained an SVM but the decision boundary is not fitting well to the data. How could adjusting the hyperplane parameters help?
- Change the kernel's color
- Increase the size of the hyperplane
- Modify the regularization parameter 'C'
- Reduce the number of support vectors
Adjusting the regularization parameter 'C' controls the trade-off between margin maximization and error minimization, helping to fit the decision boundary better.
Discuss the difference between Euclidean distance and Manhattan distance metrics in the context of KNN.
- Euclidean is faster, Manhattan is more accurate
- Euclidean is for 3D, Manhattan for 2D
- Euclidean is for continuous data, Manhattan for categorical
- Euclidean uses squares, Manhattan uses absolutes
Euclidean distance is the square root of the sum of squared differences, while Manhattan distance is the sum of the absolute differences.