You're comparing two Polynomial Regression models: one with a low degree and one with a high degree. The higher degree model fits the training data perfectly but has poor test performance. How do you interpret this, and what actions would you take?
- Choose the high degree model
- Choose the low degree model or consider regularization
- Ignore test performance
- Increase the degree further
The high degree model is likely overfitting the training data, leading to poor test performance. Choosing the low degree model or applying regularization to the high degree model can improve generalization.
What is dimensionality reduction, and why is it used in machine learning?
- All of the above
- Increasing model accuracy
- Reducing computational complexity
- Reducing number of dimensions
Dimensionality reduction refers to the process of reducing the number of input variables or dimensions in a dataset. It is used to simplify the model and reduce computational complexity, potentially improving model interpretability, but it does not inherently increase model accuracy.
If the relationship between variables in a dataset is best fit by a curve rather than a line, you might use _________ regression.
- Linear
- Logistic
- Polynomial
- Ridge
If the relationship between variables is best fit by a curve rather than a line, Polynomial regression would be used. It can model nonlinear relationships by including polynomial terms in the equation.
You have two models with similar Accuracy but different Precision and Recall values. How would you decide which model is better for a given application?
- Choose based on the specific application's needs and tolerance for false positives/negatives
- Choose the one with higher Precision
- Choose the one with higher Recall
- nan
When models have similar Accuracy but different Precision and Recall, the choice between them should be based on the specific application's needs. If false positives are more costly, prioritize Precision; if false negatives are more crucial, prioritize Recall.
In what situations would it be appropriate to use Logistic Regression with the Logit link function?
- All regression problems
- Binary classification with a nonlinear relationship between predictors
- Binary classification with linear relationship between predictors
- Multi-class classification
Logistic Regression with the Logit link function is particularly suited for binary classification problems where there is a linear relationship between the predictors and the log-odds of the response.
One method to mitigate multicollinearity is to apply ___________ regression, which adds a penalty term to the loss function.
- Lasso
- Logistic
- Polynomial
- Ridge
Ridge regression is a technique that can mitigate multicollinearity by adding a penalty term to the loss function. The penalty term helps in reducing the effect of correlated variables, leading to more stable coefficients.
Suppose you're working on a dataset with both linear and nonlinear features predicting the target variable. What regression approach might you take?
- Combine Linear and Polynomial Regression
- Linear Regression only
- Logistic Regression
- Polynomial Regression only
When dealing with a dataset with both linear and nonlinear features, combining Linear and Polynomial Regression can be an effective approach. This allows the model to capture both the linear and nonlinear relationships in the data, providing a more accurate representation of the underlying patterns.
Can you explain the main types of clustering in Unsupervised Learning?
- Divisive, K-Means, Gaussian Mixture
- Hierarchical, Divisive
- Hierarchical, K-Means, Gaussian Mixture
- K-Means, Hierarchical, Neural Network
Clustering in Unsupervised Learning refers to grouping data points that are similar to each other. The main types include Hierarchical (building nested clusters), K-Means (partitioning data into 'K' clusters), and Gaussian Mixture (using probability distributions to form clusters).
You have a Multiple Linear Regression model that is performing poorly, and you suspect multicollinearity is the issue. How would you confirm this suspicion and rectify the problem?
- Add more features
- Check the VIF and apply regularization
- Guess the correlated variables
- Increase the number of observations
You can confirm multicollinearity by checking the Variance Inflation Factor (VIF) for the variables. If high VIF values are found, applying regularization methods like Ridge regression or feature selection techniques can help rectify the problem by penalizing or removing correlated variables.
In the context of Machine Learning, the term _________ refers to the algorithm's ability to generalize from the training data to unseen data.
- Generalization
- Optimization
- Overfitting
- Regularization
Generalization refers to the model's ability to make accurate predictions on new, unseen data, as opposed to fitting only to the training data.