In the context of PCA, the ________ are unit vectors that define the directions of maximum variance, whereas the ________ represent the magnitude of variance in those directions.

  • Eigenvalues, Eigenvectors
  • Eigenvectors, Eigenvalues
  • principal components, Eigenvectors
  • principal directions, magnitudes
In PCA, the "Eigenvectors" are unit vectors that define the directions of maximum variance in the data, whereas the "Eigenvalues" represent the magnitude of variance in those directions. Together, they form the core mathematical components of PCA.

What does the assumption of linearity imply in Simple Linear Regression?

  • Both Variables are Categorized
  • Dependent Variable is Linear
  • Independent Variable is Linear
  • Relationship between Dependent and Independent Variables is Linear
The assumption of linearity implies that the relationship between the dependent and independent variables is linear. A non-linear relationship may lead to biased or inefficient estimates.

How can Ridge Regression be used to mitigate multicollinearity in Multiple Linear Regression?

  • By adding a penalty term to the coefficients
  • By increasing model complexity
  • By reducing the number of samples
  • By removing correlated variables
Ridge Regression adds a penalty term to the coefficients, shrinking them and mitigating the impact of multicollinearity. This regularization technique helps stabilize the estimates.

What is bagging, and how is it related to Random Forest?

  • Bagging involves combining predictions from multiple models, and Random Forest is an example
  • Bagging involves using a single strong model
  • Bagging is a type of boosting
  • Bagging is unrelated to Random Forest
Bagging (Bootstrap Aggregating) is a method that involves combining predictions from multiple models, each trained on a random subset of the data. Random Forest is a specific example of a bagging algorithm that uses decision trees as the base models.

In Simple Linear Regression, what is the relationship between the dependent and independent variable?

  • Cubic
  • Exponential
  • Linear
  • Quadratic
In Simple Linear Regression, the relationship between the dependent and independent variable is linear. The model tries to fit a straight line that best describes the relationship.

How are the coefficients of Simple Linear Regression estimated?

  • By Maximizing the Variance
  • By Minimizing the Sum of the Squares of the Residuals
  • Through Classification
  • Through Clustering
The coefficients in Simple Linear Regression are estimated by minimizing the sum of the squares of the residuals. This method ensures that the line fits as closely as possible to the observed data.

Ensemble methods like Random Forest and Gradient Boosting are considered powerful tools, but they can lead to __________ if not tuned properly.

  • Both Underfitting and Overfitting
  • Overfitting
  • Underfitting
  • nan
Ensemble methods like Random Forest and Gradient Boosting can lead to overfitting if not tuned properly, as they may become too complex and fit the noise in the training data instead of the underlying pattern.

You are given a dataset without clear instructions on what the targets are. How would you proceed to build a predictive model?

  • Build a regression model directly
  • Consult with domain experts or analyze the data for insights
  • Guess the targets
  • Ignore the data
Consulting with domain experts or analyzing the data through exploratory data analysis (EDA) can help identify potential targets and correlations within the data. This collaborative and investigative approach ensures that the predictive model is aligned with the underlying patterns and relevant subject matter.

You are using Simple Linear Regression for a time-series dataset, and the residuals show a pattern. What does this imply, and what might be the remedy?

  • Autocorrelation Present, Use Time-Series Model
  • Model is Perfect
  • Multicollinearity, Remove Variables
  • Normal Distribution, No Remedy Needed
If residuals show a pattern in a time-series dataset, autocorrelation might be present, violating the independence assumption. A time-series model like ARIMA may be a more suitable approach.

Imagine you are working with a dataset where the classes are highly overlapped. How would LDA handle this situation, and what might be the challenges?

  • LDA would easily separate the classes; no challenges
  • LDA would ignore the overlap and classify randomly
  • LDA would require additional data for proper classification
  • LDA would struggle to separate the classes; potential misclassification
LDA would "struggle to separate the classes" when there's high overlap, as it relies on maximizing between-class variance. The challenges include potential misclassification and decreased accuracy.