The process of training a Machine Learning model involves using a dataset known as the _ set, while evaluating it involves the _ set.

testing, validation
training, testing
validation, testing
validation, training

In supervised learning, a "training" set is used to train the model, and a "testing" set is used to evaluate its predictive performance on unseen data.

Discuss it

How can centering variables help in interpreting interaction effects in Multiple Linear Regression?

By increasing model accuracy
By increasing prediction speed
By reducing multicollinearity between main effects and interaction terms
By simplifying the model

Centering variables (subtracting the mean) can reduce multicollinearity between main effects and interaction terms, making it easier to interpret the individual and combined effects of the variables.

Discuss it

In a case where your regression model is suffering from high variance, what regularization technique might you apply, and why?

Increase model complexity
L1 regularization
L2 regularization (Ridge)
Reduce model complexity

High variance in a regression model often signals overfitting, where the model performs well on training data but poorly on unseen data. L2 regularization (Ridge regression) can help by penalizing large coefficients, reducing overfitting, and improving generalization.

Discuss it

How does the objective function differ between Ridge, Lasso, and ElasticNet?

No difference
Ridge and Lasso have the same objective
Ridge uses L1, Lasso uses L2, ElasticNet uses neither
Ridge uses L2, Lasso uses L1, ElasticNet uses both

Ridge's objective function includes an L2 penalty, Lasso's includes an L1 penalty, and ElasticNet's includes both L1 and L2 penalties.

Discuss it

How do Precision and Recall trade-off in a classification problem, and when might you prioritize one over the other?

Increasing Precision decreases Recall, prioritize Precision when false positives are costly
Increasing Precision increases Recall, prioritize Recall when false positives are costly
Precision and Recall are independent, no trade-off
nan

Precision and Recall often trade-off; increasing one can decrease the other. You might prioritize Precision when false positives are more costly (e.g., spam detection) and Recall when false negatives are more costly (e.g., fraud detection).

Discuss it

The point in the ROC Curve where the True Positive Rate equals the False Positive Rate is known as the __________ point.

Break-even
Equilibrium
Random
nan

The Break-even point on the ROC Curve is where the True Positive Rate equals the False Positive Rate. This point represents a balance between sensitivity and specificity.

Discuss it

You have a dataset with a clear elbow point, but the K-Means clustering is still not performing well. How could centroid initialization be contributing to this issue?

Centroids initialized too far from the data
Centroids initialized within one cluster
Initializing centroids based on mean
Poor centroid initialization causing slow convergence

Poor centroid initialization can cause slow convergence or convergence to suboptimal solutions, even when there is a clear elbow point. This leads to the K-Means clustering not performing as well as it should.

Discuss it

When using the Elbow Method in K-Means, the optimal number of clusters is typically found where the plot shows a(n) _________, indicating a point of diminishing returns.

Elbow
Foot
Hand
Knee

In the context of K-Means, the "elbow" refers to the point in the plot where adding more clusters does not significantly reduce the within-cluster sum of squares. It indicates a point of diminishing returns in terms of cluster separation.

Discuss it

Overfitting is a condition where a model learns the _________ of the training data, leading to poor generalization.

features
noise
patterns
variance

Overfitting occurs when a model learns the noise of the training data, which doesn't generalize well to unseen data.

Discuss it

While performing Cross-Validation, you notice a significant discrepancy between training and validation performance in each fold. What might be the reason, and how would you address it?

All of the above
Data leakage; ensure proper separation between training and validation
Overfitting; reduce model complexity
Underfitting; increase model complexity

A significant discrepancy between training and validation performance could result from overfitting, underfitting, or data leakage. Addressing it requires identifying the underlying issue and taking appropriate action, such as reducing/increasing model complexity for overfitting/underfitting or ensuring proper separation between training and validation to prevent leakage.

Discuss it

The process of training a Machine Learning model involves using a dataset known as the _________ set, while evaluating it involves the _________ set.

How can centering variables help in interpreting interaction effects in Multiple Linear Regression?

In a case where your regression model is suffering from high variance, what regularization technique might you apply, and why?

How does the objective function differ between Ridge, Lasso, and ElasticNet?

How do Precision and Recall trade-off in a classification problem, and when might you prioritize one over the other?

The point in the ROC Curve where the True Positive Rate equals the False Positive Rate is known as the __________ point.

You have a dataset with a clear elbow point, but the K-Means clustering is still not performing well. How could centroid initialization be contributing to this issue?

When using the Elbow Method in K-Means, the optimal number of clusters is typically found where the plot shows a(n) _________, indicating a point of diminishing returns.

Overfitting is a condition where a model learns the _________ of the training data, leading to poor generalization.

While performing Cross-Validation, you notice a significant discrepancy between training and validation performance in each fold. What might be the reason, and how would you address it?

The process of training a Machine Learning model involves using a dataset known as the _ set, while evaluating it involves the _ set.