Explain the concept of regularization in Machine Learning. What are some common techniques?
- Increasing complexity, Gradient Boosting
- Increasing complexity, L1/L2
- Reducing complexity, Gradient Descent
- Reducing complexity, L1/L2
Regularization is a technique to reduce overfitting by adding a penalty term to the loss function. Common techniques include L1 (lasso) and L2 (ridge) regularization, which penalize large coefficients in a model.
In a dataset with fluctuating values, you've applied Polynomial Regression, and the model seems to fit even the noise. What are the potential risks, and how could you mitigate them?
- Add more noise
- Ignore the noise
- Reduce model complexity through lower degree or regularization
- Use a linear model
The risk is overfitting the noise, which will harm the model's generalization ability. Reducing the polynomial degree or using regularization techniques can mitigate this by constraining the model's complexity.
How does Deep Learning model complexity typically compare to traditional Machine Learning models, and what are the implications of this?
- Less complex and easier to train
- Less complex and requires less data
- More complex and easier to interpret
- More complex and requires more data and computation
Deep Learning models are typically more complex, requiring more data and computational resources, which can make training and tuning more challenging.
The risk of overfitting can be increased if the same data is used for both _________ and _________ of the Machine Learning model.
- evaluation, processing
- training, testing
- training, validation
- validation, training
If the same data is used for both "training" and "testing," the model may perform well on that data but poorly on unseen data, leading to overfitting.
To detect multicollinearity in a dataset, one common method is to calculate the ___________ Inflation Factor (VIF).
- Validation
- Variable
- Variance
- Vector
The Variance Inflation Factor (VIF) is a measure used to detect multicollinearity. It quantifies how much a variable is inflating the standard errors due to its correlation with other variables. A high VIF indicates multicollinearity.
_________ clustering builds a tree-like diagram called a dendrogram, allowing you to visualize the relationships between clusters.
- DBSCAN
- Hierarchical
- K-Means
- Spectral
Hierarchical clustering builds a dendrogram, which allows visualization of the relationships between clusters, showing how the clusters are connected.
How does LDA maximize the separation between different classes in a dataset?
- By maximizing between-class variance and minimizing within-class variance
- By maximizing both within-class and between-class variance
- By minimizing between-class variance and maximizing within-class variance
- By minimizing both within-class and between-class variance
LDA maximizes the separation between different classes by "maximizing between-class variance and minimizing within-class variance." This process ensures that different classes are far apart, while data points within the same class are close together, resulting in better class separation.
You reduced the complexity of your model to prevent overfitting, but it led to underfitting. How would you find a balance between complexity and fit?
- Add regularization
- All of the above
- Increase dataset size
- Try cross-validation
Finding a balance might involve using cross-validation to systematically find the right level of complexity that fits well with the training data but also generalizes well to the validation data. This process helps in finding the right hyperparameters without biasing the test data.
The ___________ regression technique can be used when the relationship between the independent and dependent variables is not linear.
- L1 Regularization
- Logistic
- Polynomial
- Simple Linear
Polynomial Regression can model non-linear relationships between independent and dependent variables by transforming the predictors into a polynomial form, allowing for more complex fits.
How does adding regularization help in avoiding overfitting?
- By adding noise to the training data
- By fitting the model closely to the training data
- By increasing model complexity
- By reducing model complexity
Regularization helps in avoiding overfitting by "reducing model complexity." It adds a penalty to the loss function, constraining the weights and preventing the model from fitting too closely to the training data.