Increasing the regularization parameter in Ridge regression will ________ the coefficients but will not set them to zero.

Decrease
Increase
Maintain
nan

Increasing the regularization parameter in Ridge regression will shrink the coefficients towards zero but will not set them to zero, due to the L2 penalty.

Discuss it

Balancing the _________ in a training dataset is vital to ensure that the model does not become biased towards one particular outcome.

classes
features
models
parameters

Balancing the "classes" in a training dataset ensures that the model does not become biased towards one class, leading to a more accurate and fair representation of the data. This is especially crucial in classification tasks.

Discuss it

Overfitting in Polynomial Regression can be visualized by a graph where the polynomial curve fits even the _________ in the training data.

accuracy
linearity
noise
stability

A graph showing overfitting in Polynomial Regression will exhibit the polynomial curve fitting even the noise in the training data, not just the underlying trend.

Discuss it

In a case where you have a dataset with numerous outliers, which clustering algorithm would you choose and why?

DBSCAN due to robustness to outliers
DBSCAN due to sensitivity to noise
K-Means due to robustness to noise
K-Means due to sensitivity to outliers

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) would be suitable since it's robust to outliers. It can identify dense clusters and leave outliers as unclassified, making it effective in such scenarios.

Discuss it

How can you detect whether a model is overfitting or underfitting the data?

By analyzing the training and validation errors
By increasing model complexity
By looking at the model's visualizations
By reducing model complexity

Detecting overfitting or underfitting can be done "by analyzing the training and validation errors." Overfitting shows high training accuracy but low validation accuracy, while underfitting shows poor performance on both.

Discuss it

Explain the Bias-Variance tradeoff in the context of Cross-Validation.

Increasing k decreases bias but may increase variance
Increasing k decreases both bias and variance
Increasing k increases bias but decreases variance
Increasing k increases both bias and variance

The Bias-Variance tradeoff in the context of k-fold Cross-Validation refers to the balance between bias (error due to overly simplistic assumptions) and variance (error due to excessive complexity). Increasing k generally decreases bias since more data is used for training, but it may lead to an increase in variance as the validation set becomes more similar to the training set.

Discuss it

You're given a dataset with several features, some of which are highly correlated. How would you handle this using dimensionality reduction techniques?

Applying K-Means Clustering
Applying L1 Regularization
Applying Principal Component Analysis (PCA)
Applying Random Forest

Principal Component Analysis (PCA) would be used to handle high correlation among features. It reduces dimensionality by creating new uncorrelated variables that capture the variance present in the original features.

Discuss it

The addition of _________ in the loss function is a common technique to regularize the model and prevent overfitting.

bias
learning rate
regularization terms
weights

Regularization terms (like L1 or L2 penalties) in the loss function constrain the model, reducing the risk of overfitting by preventing large weights.

Discuss it

You have trained an SVM but the decision boundary is not fitting well to the data. How could adjusting the hyperplane parameters help?

Change the kernel's color
Increase the size of the hyperplane
Modify the regularization parameter 'C'
Reduce the number of support vectors

Adjusting the regularization parameter 'C' controls the trade-off between margin maximization and error minimization, helping to fit the decision boundary better.

Discuss it

Discuss the difference between Euclidean distance and Manhattan distance metrics in the context of KNN.

Euclidean is faster, Manhattan is more accurate
Euclidean is for 3D, Manhattan for 2D
Euclidean is for continuous data, Manhattan for categorical
Euclidean uses squares, Manhattan uses absolutes

Euclidean distance is the square root of the sum of squared differences, while Manhattan distance is the sum of the absolute differences.

Discuss it