What are the advantages and limitations of using Ridge regression over ordinary linear regression?
- Increases bias, Reduces variance, Reduces multicollinearity, Can cause overfitting
- Increases bias, Reduces variance, Tackles multicollinearity, Can cause underfitting
- Reduces overfitting, Increases variance, Lower bias, Lower variance
- Reduces overfitting, Tackles multicollinearity, Lower bias, Lower variance
Ridge regression helps in reducing overfitting by penalizing large coefficients through L2 regularization. It can handle multicollinearity but increases bias, potentially leading to underfitting. Ordinary linear regression lacks these regularization properties.
What is the Elbow Method in the context of K-Means clustering?
- A centroid initialization technique
- A clustering visualization tool
- A method to determine the number of clusters
- A way to calculate distance between points
The Elbow Method in K-Means clustering is used to find the optimal number of clusters by plotting the variance as a function of the number of clusters and finding the "elbow" point.
The technique called ___________ can be used for nonlinear dimensionality reduction, providing a way to reduce dimensions while preserving the relationships between instances.
- PCA
- clustering
- normalization
- t-SNE
t-SNE (t-distributed Stochastic Neighbor Embedding) is a technique used for nonlinear dimensionality reduction. It's effective at preserving the relationships between instances in the reduced space, making it suitable for complex datasets where linear methods like PCA might fail.
Suppose you have hierarchical data and need to understand the relationships between different parts. How would you approach clustering in this context?
- Use DBSCAN
- Use Hierarchical Clustering
- Use K-Means
- Use Mean Shift
Hierarchical Clustering is well-suited for understanding relationships within hierarchical data, as it creates a tree-like structure representing data hierarchies.
In Gradient Boosting, the learning rate, also known as the __________ rate, controls the contribution of each tree to the final prediction.
- Boosting
- Growing
- Shrinkage
- nan
The learning rate in Gradient Boosting is often referred to as the shrinkage rate, controlling the contribution of each tree to the final prediction. A smaller learning rate means each tree has a smaller influence, leading to a more robust model.
You've noticed that changing the Epsilon value drastically changes the clustering results in your DBSCAN model. What strategies could you employ to select an optimal value?
- Choose Epsilon randomly
- Set Epsilon to a fixed value across all datasets
- Use the 'k-distance graph'
- Use trial and error
The 'k-distance graph' is a common method used to select the optimal Epsilon value in DBSCAN. By plotting the distance to the kth nearest neighbor for each point, you can identify an inflection point that represents an optimal balance between cluster density and granularity, helping you to choose an appropriate Epsilon value.
What is the primary challenge in implementing unsupervised learning as compared to supervised learning?
- Difficulty in validation
- Lack of rewards
- Requires more data
- Uses only labeled data
The primary challenge in unsupervised learning is the difficulty in validation since there are no predefined labels to assess the model's accuracy.
What is the effect of increasing the regularization parameter in Ridge and Lasso regression?
- Decrease in bias and increase in variance
- Increase in bias and decrease in variance
- Increase in both bias and variance
- No change in bias and variance
Increasing the regularization parameter leads to greater regularization strength, resulting in an increase in bias and a decrease in variance, thus constraining the model complexity.
How does dimensionality reduction help in reducing the risk of overfitting?
- All of the above
- By reducing noise
- By removing irrelevant features
- By simplifying the model
Dimensionality reduction helps in reducing the risk of overfitting by removing irrelevant features (reducing complexity), reducing noise (avoiding fitting to noise), and simplifying the model (making it more generalized).
You are dealing with a dataset having many irrelevant features. How would you apply Lasso regression to deal with this scenario?
- By increasing the degree of the polynomial
- By using L1 regularization
- By using L2 regularization
- By using both L1 and L2 regularization
Lasso regression applies L1 regularization, which can shrink the coefficients of irrelevant features to exactly zero. This effectively performs feature selection, removing the irrelevant features from the model and simplifying it.