What is bagging, and how is it related to Random Forest?
- Bagging involves combining predictions from multiple models, and Random Forest is an example
- Bagging involves using a single strong model
- Bagging is a type of boosting
- Bagging is unrelated to Random Forest
Bagging (Bootstrap Aggregating) is a method that involves combining predictions from multiple models, each trained on a random subset of the data. Random Forest is a specific example of a bagging algorithm that uses decision trees as the base models.
In Simple Linear Regression, what is the relationship between the dependent and independent variable?
- Cubic
- Exponential
- Linear
- Quadratic
In Simple Linear Regression, the relationship between the dependent and independent variable is linear. The model tries to fit a straight line that best describes the relationship.
How are the coefficients of Simple Linear Regression estimated?
- By Maximizing the Variance
- By Minimizing the Sum of the Squares of the Residuals
- Through Classification
- Through Clustering
The coefficients in Simple Linear Regression are estimated by minimizing the sum of the squares of the residuals. This method ensures that the line fits as closely as possible to the observed data.
Ensemble methods like Random Forest and Gradient Boosting are considered powerful tools, but they can lead to __________ if not tuned properly.
- Both Underfitting and Overfitting
- Overfitting
- Underfitting
- nan
Ensemble methods like Random Forest and Gradient Boosting can lead to overfitting if not tuned properly, as they may become too complex and fit the noise in the training data instead of the underlying pattern.
You are given a dataset without clear instructions on what the targets are. How would you proceed to build a predictive model?
- Build a regression model directly
- Consult with domain experts or analyze the data for insights
- Guess the targets
- Ignore the data
Consulting with domain experts or analyzing the data through exploratory data analysis (EDA) can help identify potential targets and correlations within the data. This collaborative and investigative approach ensures that the predictive model is aligned with the underlying patterns and relevant subject matter.
In a scenario where your model is consistently achieving mediocre performance on both training and validation data, what might be the underlying problem, and what would be your approach to fix it?
- Increase complexity
- Overfitting, reduce complexity
- Reduce complexity
- Underfitting, add complexity
The underlying problem might be underfitting, where the model is too simple to capture the underlying patterns. Increasing the model's complexity would likely improve performance on both training and validation data.
In the context of PCA, the ________ are unit vectors that define the directions of maximum variance, whereas the ________ represent the magnitude of variance in those directions.
- Eigenvalues, Eigenvectors
- Eigenvectors, Eigenvalues
- principal components, Eigenvectors
- principal directions, magnitudes
In PCA, the "Eigenvectors" are unit vectors that define the directions of maximum variance in the data, whereas the "Eigenvalues" represent the magnitude of variance in those directions. Together, they form the core mathematical components of PCA.
What does the assumption of linearity imply in Simple Linear Regression?
- Both Variables are Categorized
- Dependent Variable is Linear
- Independent Variable is Linear
- Relationship between Dependent and Independent Variables is Linear
The assumption of linearity implies that the relationship between the dependent and independent variables is linear. A non-linear relationship may lead to biased or inefficient estimates.
You have a dataset with clusters of varying densities. How would you configure the Epsilon and MinPts in DBSCAN to handle this?
- Increase Epsilon; Decrease MinPts
- Increase both Epsilon and MinPts
- Reduce both Epsilon and MinPts
- Use a different clustering algorithm
DBSCAN's Epsilon and MinPts are global parameters that apply to all clusters. If clusters have varying densities, tuning these parameters to fit one density might not suit others, leading to misclustering. In such a scenario, a different clustering algorithm that can handle varying densities might be more appropriate.
What is the main difference between Ridge and Lasso regularization?
- Both use L1 penalty
- Both use L2 penalty
- Ridge uses L1 penalty, Lasso uses L2 penalty
- Ridge uses L2 penalty, Lasso uses L1 penalty
Ridge regularization uses an L2 penalty, which shrinks coefficients but keeps them non-zero, while Lasso uses an L1 penalty, leading to some coefficients being exactly zero.