The process of training a Machine Learning model involves using a dataset known as the _________ set, while evaluating it involves the _________ set.
- testing, validation
- training, testing
- validation, testing
- validation, training
In supervised learning, a "training" set is used to train the model, and a "testing" set is used to evaluate its predictive performance on unseen data.
How can centering variables help in interpreting interaction effects in Multiple Linear Regression?
- By increasing model accuracy
- By increasing prediction speed
- By reducing multicollinearity between main effects and interaction terms
- By simplifying the model
Centering variables (subtracting the mean) can reduce multicollinearity between main effects and interaction terms, making it easier to interpret the individual and combined effects of the variables.
In a case where your regression model is suffering from high variance, what regularization technique might you apply, and why?
- Increase model complexity
- L1 regularization
- L2 regularization (Ridge)
- Reduce model complexity
High variance in a regression model often signals overfitting, where the model performs well on training data but poorly on unseen data. L2 regularization (Ridge regression) can help by penalizing large coefficients, reducing overfitting, and improving generalization.
How does the objective function differ between Ridge, Lasso, and ElasticNet?
- No difference
- Ridge and Lasso have the same objective
- Ridge uses L1, Lasso uses L2, ElasticNet uses neither
- Ridge uses L2, Lasso uses L1, ElasticNet uses both
Ridge's objective function includes an L2 penalty, Lasso's includes an L1 penalty, and ElasticNet's includes both L1 and L2 penalties.
How do Precision and Recall trade-off in a classification problem, and when might you prioritize one over the other?
- Increasing Precision decreases Recall, prioritize Precision when false positives are costly
- Increasing Precision increases Recall, prioritize Recall when false positives are costly
- Precision and Recall are independent, no trade-off
- nan
Precision and Recall often trade-off; increasing one can decrease the other. You might prioritize Precision when false positives are more costly (e.g., spam detection) and Recall when false negatives are more costly (e.g., fraud detection).
The point in the ROC Curve where the True Positive Rate equals the False Positive Rate is known as the __________ point.
- Break-even
- Equilibrium
- Random
- nan
The Break-even point on the ROC Curve is where the True Positive Rate equals the False Positive Rate. This point represents a balance between sensitivity and specificity.
You have a dataset with a clear elbow point, but the K-Means clustering is still not performing well. How could centroid initialization be contributing to this issue?
- Centroids initialized too far from the data
- Centroids initialized within one cluster
- Initializing centroids based on mean
- Poor centroid initialization causing slow convergence
Poor centroid initialization can cause slow convergence or convergence to suboptimal solutions, even when there is a clear elbow point. This leads to the K-Means clustering not performing as well as it should.
When using the Elbow Method in K-Means, the optimal number of clusters is typically found where the plot shows a(n) _________, indicating a point of diminishing returns.
- Elbow
- Foot
- Hand
- Knee
In the context of K-Means, the "elbow" refers to the point in the plot where adding more clusters does not significantly reduce the within-cluster sum of squares. It indicates a point of diminishing returns in terms of cluster separation.
Overfitting is a condition where a model learns the _________ of the training data, leading to poor generalization.
- features
- noise
- patterns
- variance
Overfitting occurs when a model learns the noise of the training data, which doesn't generalize well to unseen data.
While performing Cross-Validation, you notice a significant discrepancy between training and validation performance in each fold. What might be the reason, and how would you address it?
- All of the above
- Data leakage; ensure proper separation between training and validation
- Overfitting; reduce model complexity
- Underfitting; increase model complexity
A significant discrepancy between training and validation performance could result from overfitting, underfitting, or data leakage. Addressing it requires identifying the underlying issue and taking appropriate action, such as reducing/increasing model complexity for overfitting/underfitting or ensuring proper separation between training and validation to prevent leakage.