Explain how Ridge and Lasso handle multicollinearity among the features.
- Both eliminate correlated features
- Both keep correlated features
- Ridge eliminates correlated features; Lasso keeps them
- Ridge keeps correlated features; Lasso eliminates them
Ridge regularization keeps correlated features but shrinks coefficients; Lasso can eliminate some by setting coefficients to zero.
What are some common applications for each of the four types of Machine Learning: Supervised, Unsupervised, Semi-Supervised, and Reinforcement?
- Specific to finance
- Specific to healthcare
- Specific to manufacturing
- Varies based on the problem domain
The applications for these types of Machine Learning vary and can be tailored to various problem domains, not confined to specific industries.
What is the difference between simple linear regression and multiple linear regression?
- Number of dependent variables
- Number of equations
- Number of independent variables
- Number of observations
Simple linear regression involves one independent variable to predict a dependent variable, whereas multiple linear regression uses two or more independent variables for prediction. The inclusion of more variables in multiple linear regression allows for more complex models and can lead to a better understanding of the relationships between variables.
The performance of an LDA model can be evaluated using ___________, which considers both within-class and between-class variances.
- accuracy metrics
- error rate
- feature selection
- principal components
"Accuracy metrics" that consider both within-class and between-class variances can be used to evaluate the performance of an LDA model. It gives a comprehensive view of how well the model has separated the classes.
In K-Means clustering, the algorithm iteratively assigns each data point to the nearest _______, recalculating the centroids until convergence.
- Centroid
- Cluster
- Data Point
- Distance Metric
In K-Means, the algorithm assigns each data point to the nearest centroid and recalculates the centroids until convergence.
You are working on a project where you have an abundance of features. How do you decide which features to include in your model and why?
- Apply feature selection techniques
- Randomly pick features
- Use all features
- Use only numerical features
Applying feature selection techniques like mutual information, correlation-based methods, or tree-based methods helps in removing irrelevant or redundant features. This enhances the model's performance by reducing overfitting and improving interpretability.
In Supervised Learning, _________ and ___________ are the two main types of problems.
- Classification; Clustering
- Classification; Regression
- Regression; Clustering
- Regression; Ensemble Learning
In Supervised Learning, the two main types of problems are Classification and Regression. Classification is about categorizing data into predefined classes, while Regression is predicting a continuous outcome.
You've built a multiple linear regression model and found that two or more predictors are highly correlated. What problems might this cause, and how can you solve them?
- High bias, Address by increasing the model complexity
- High variance, Address by using Lasso regression
- Overfitting, Address by removing correlated features or using Ridge regression
- Underfitting, Address by adding more features
Multicollinearity, where predictors are highly correlated, can cause overfitting and unstable estimates. This can be addressed by removing correlated features or using Ridge regression, which penalizes large coefficients and reduces the impact of multicollinearity.
What method is commonly used to estimate the coefficients in Simple Linear Regression?
- Maximum Likelihood Estimation
- Minimizing the Sum of Absolute Errors
- Minimizing the Sum of the Squares of the Residuals
- Neural Networks Training
In Simple Linear Regression, the method used to estimate coefficients is by minimizing the sum of the squares of the residuals, known as the Ordinary Least Squares (OLS) method.
In the context of building a model, the _________ are carefully selected and processed to improve the model's performance.
- features
- parameters
- testing set
- training set
"Features" are the input variables that are carefully selected and processed (e.g., through feature engineering or scaling) to enhance the model's predictive performance.