How do Precision and Recall trade-off in a classification problem, and when might you prioritize one over the other?
- Increasing Precision decreases Recall, prioritize Precision when false positives are costly
- Increasing Precision increases Recall, prioritize Recall when false positives are costly
- Precision and Recall are independent, no trade-off
- nan
Precision and Recall often trade-off; increasing one can decrease the other. You might prioritize Precision when false positives are more costly (e.g., spam detection) and Recall when false negatives are more costly (e.g., fraud detection).
Can you detail how to prevent overfitting in Polynomial Regression?
- By ignoring the test set
- By increasing the degree
- By using all features
- By using regularization techniques like Ridge and Lasso
Overfitting in Polynomial Regression can be prevented by using regularization techniques like Ridge and Lasso. These techniques add a penalty term to the loss function, constraining the coefficients and reducing the complexity of the model.
What are the implications of using R-Squared vs. Adjusted R-Squared in a multiple regression model with many predictors?
- R-Squared favors complex models; Adjusted R-Squared is more sensitive to noise
- R-Squared favors more predictors without penalty; Adjusted R-Squared penalizes unnecessary predictors
- R-Squared is better for small datasets; Adjusted R-Squared is only applicable to linear models
- R-Squared provides better interpretability; Adjusted R-Squared favors simple models
In multiple regression models with many predictors, using R-Squared may favor the inclusion of more predictors without penalizing for their irrelevance, leading to potentially overfitted models. In contrast, Adjusted R-Squared includes a penalty term for unnecessary predictors, providing a more balanced assessment of the model's performance. It helps in avoiding the trap of increasing complexity without meaningful gains in explanatory power.
Which Machine Learning approach allows the system to learn and make decisions from experience?
- Reinforcement Learning
- Semi-Supervised Learning
- Supervised Learning
- Unsupervised Learning
Reinforcement Learning allows the system to learn and make decisions through trial and error, receiving rewards or penalties, and learning from experience to achieve a specific goal.
The process of training a Machine Learning model involves using a dataset known as the _________ set, while evaluating it involves the _________ set.
- testing, validation
- training, testing
- validation, testing
- validation, training
In supervised learning, a "training" set is used to train the model, and a "testing" set is used to evaluate its predictive performance on unseen data.
Overfitting is a condition where a model learns the _________ of the training data, leading to poor generalization.
- features
- noise
- patterns
- variance
Overfitting occurs when a model learns the noise of the training data, which doesn't generalize well to unseen data.
While performing Cross-Validation, you notice a significant discrepancy between training and validation performance in each fold. What might be the reason, and how would you address it?
- All of the above
- Data leakage; ensure proper separation between training and validation
- Overfitting; reduce model complexity
- Underfitting; increase model complexity
A significant discrepancy between training and validation performance could result from overfitting, underfitting, or data leakage. Addressing it requires identifying the underlying issue and taking appropriate action, such as reducing/increasing model complexity for overfitting/underfitting or ensuring proper separation between training and validation to prevent leakage.
What is the primary purpose of using ensemble methods in machine learning?
- To combine multiple weak models to form a strong model
- To focus on a single algorithm
- To reduce computational complexity
- To use only the best model
Ensemble methods combine the predictions from multiple weak models to form a more robust and accurate model. By leveraging the strength of multiple models, they typically achieve better generalization and performance than using a single model.
Cross-validation, such as _______-fold cross-validation, can help in detecting and preventing overfitting.
- 10
- 3
- 5
- any number
Any number of folds can be used in cross-validation, although commonly used numbers include 5 and 10. Cross-validation helps in model validation and prevents overfitting.
Can you explain the main concept behind boosting algorithms?
- Boosting always uses Random Forest
- Boosting combines models sequentially, giving more weight to misclassified instances
- Boosting focuses on the strongest predictions
- Boosting involves reducing model complexity
Boosting is an ensemble method where models are combined sequentially, with each model focusing more on the instances that were misclassified by the previous models. This iterative process helps in correcting the mistakes of earlier models, leading to improved performance.