What is the difference between simple linear regression and multiple linear regression?
- Number of dependent variables
- Number of equations
- Number of independent variables
- Number of observations
Simple linear regression involves one independent variable to predict a dependent variable, whereas multiple linear regression uses two or more independent variables for prediction. The inclusion of more variables in multiple linear regression allows for more complex models and can lead to a better understanding of the relationships between variables.
The performance of an LDA model can be evaluated using ___________, which considers both within-class and between-class variances.
- accuracy metrics
- error rate
- feature selection
- principal components
"Accuracy metrics" that consider both within-class and between-class variances can be used to evaluate the performance of an LDA model. It gives a comprehensive view of how well the model has separated the classes.
In K-Means clustering, the algorithm iteratively assigns each data point to the nearest _______, recalculating the centroids until convergence.
- Centroid
- Cluster
- Data Point
- Distance Metric
In K-Means, the algorithm assigns each data point to the nearest centroid and recalculates the centroids until convergence.
You are working on a project where you have an abundance of features. How do you decide which features to include in your model and why?
- Apply feature selection techniques
- Randomly pick features
- Use all features
- Use only numerical features
Applying feature selection techniques like mutual information, correlation-based methods, or tree-based methods helps in removing irrelevant or redundant features. This enhances the model's performance by reducing overfitting and improving interpretability.
You've been asked to optimize the features for a given model. What strategies might you use, and why?
- Both feature engineering and scaling
- Feature engineering
- Feature scaling
- Random feature selection
Feature engineering involves creating new features or transforming existing ones to better represent the underlying patterns. Feature scaling, such as normalization or standardization, helps to standardize the range of features, enhancing the model's ability to learn. Both strategies together contribute to optimizing the model by improving convergence and interpretability.
What is overfitting, and why is it a problem in Machine Learning models?
- Fitting a model too loosely to training data
- Fitting a model too well to training data, ignoring generalization
- Ignoring irrelevant features
- Including too many variables
Overfitting occurs when a model fits the training data too well, capturing noise rather than the underlying pattern. This leads to poor generalization to new data, resulting in suboptimal predictions on unseen data.
In the context of building a model, the _________ are carefully selected and processed to improve the model's performance.
- features
- parameters
- testing set
- training set
"Features" are the input variables that are carefully selected and processed (e.g., through feature engineering or scaling) to enhance the model's predictive performance.
You built a model using Lasso regularization but some important features were wrongly set to zero. How would you modify your approach to keep these features?
- Combine with ElasticNet
- Decrease L1 penalty
- Increase L1 penalty
- Switch to Ridge
Combining with ElasticNet allows for balancing between L1 and L2 penalties, thus avoiding complete elimination of important features by the L1 penalty.
If multicollinearity is a concern, ________ regularization can provide a solution by shrinking the coefficients.
- ElasticNet
- Lasso
- Ridge
- nan
Ridge regularization provides a solution to multicollinearity by shrinking the coefficients through the L2 penalty, which helps to stabilize the estimates.
What are some common methods to detect multicollinearity in a dataset?
- Adding more data
- Feature scaling
- Regularization techniques
- VIF, Correlation Matrix
Common methods to detect multicollinearity include calculating the Variance Inflation Factor (VIF) and examining the correlation matrix among variables.