In the context of regression, the relationship between the independent variable and the dependent variable is represented by a mathematical equation called a _________.

  • Linear Equation
  • Model
  • Polynomial Equation
  • Regression Equation
The relationship between the independent variable and the dependent variable in regression is represented by a regression equation, which describes how the dependent variable changes as the independent variable changes.

The _________ is a crucial aspect of a Machine Learning model that quantifies how well the model's predictions match the actual targets.

  • Activation function
  • Learning rate
  • Loss function
  • Optimization algorithm
The loss function quantifies the difference between the predicted values and the actual targets, guiding the learning process.

A company wants to classify emails as either spam or not spam. What would be your approach to create a classification model for this problem?

  • Ignore the email content; focus on sender details
  • Use only email metadata
  • Use text mining techniques to extract features; use suitable classification algorithm
  • Use unsupervised learning
Extracting relevant features from the email content using text mining techniques and applying a suitable classification algorithm (e.g., Naive Bayes, SVM) would be an effective approach for spam email classification.

What does the assumption of linearity imply in Simple Linear Regression?

  • Both Variables are Categorized
  • Dependent Variable is Linear
  • Independent Variable is Linear
  • Relationship between Dependent and Independent Variables is Linear
The assumption of linearity implies that the relationship between the dependent and independent variables is linear. A non-linear relationship may lead to biased or inefficient estimates.

In the context of PCA, the ________ are unit vectors that define the directions of maximum variance, whereas the ________ represent the magnitude of variance in those directions.

  • Eigenvalues, Eigenvectors
  • Eigenvectors, Eigenvalues
  • principal components, Eigenvectors
  • principal directions, magnitudes
In PCA, the "Eigenvectors" are unit vectors that define the directions of maximum variance in the data, whereas the "Eigenvalues" represent the magnitude of variance in those directions. Together, they form the core mathematical components of PCA.

In a scenario where your model is consistently achieving mediocre performance on both training and validation data, what might be the underlying problem, and what would be your approach to fix it?

  • Increase complexity
  • Overfitting, reduce complexity
  • Reduce complexity
  • Underfitting, add complexity
The underlying problem might be underfitting, where the model is too simple to capture the underlying patterns. Increasing the model's complexity would likely improve performance on both training and validation data.

You are given a dataset without clear instructions on what the targets are. How would you proceed to build a predictive model?

  • Build a regression model directly
  • Consult with domain experts or analyze the data for insights
  • Guess the targets
  • Ignore the data
Consulting with domain experts or analyzing the data through exploratory data analysis (EDA) can help identify potential targets and correlations within the data. This collaborative and investigative approach ensures that the predictive model is aligned with the underlying patterns and relevant subject matter.

Ensemble methods like Random Forest and Gradient Boosting are considered powerful tools, but they can lead to __________ if not tuned properly.

  • Both Underfitting and Overfitting
  • Overfitting
  • Underfitting
  • nan
Ensemble methods like Random Forest and Gradient Boosting can lead to overfitting if not tuned properly, as they may become too complex and fit the noise in the training data instead of the underlying pattern.

How are the coefficients of Simple Linear Regression estimated?

  • By Maximizing the Variance
  • By Minimizing the Sum of the Squares of the Residuals
  • Through Classification
  • Through Clustering
The coefficients in Simple Linear Regression are estimated by minimizing the sum of the squares of the residuals. This method ensures that the line fits as closely as possible to the observed data.

In Simple Linear Regression, what is the relationship between the dependent and independent variable?

  • Cubic
  • Exponential
  • Linear
  • Quadratic
In Simple Linear Regression, the relationship between the dependent and independent variable is linear. The model tries to fit a straight line that best describes the relationship.

What is bagging, and how is it related to Random Forest?

  • Bagging involves combining predictions from multiple models, and Random Forest is an example
  • Bagging involves using a single strong model
  • Bagging is a type of boosting
  • Bagging is unrelated to Random Forest
Bagging (Bootstrap Aggregating) is a method that involves combining predictions from multiple models, each trained on a random subset of the data. Random Forest is a specific example of a bagging algorithm that uses decision trees as the base models.

How can Ridge Regression be used to mitigate multicollinearity in Multiple Linear Regression?

  • By adding a penalty term to the coefficients
  • By increasing model complexity
  • By reducing the number of samples
  • By removing correlated variables
Ridge Regression adds a penalty term to the coefficients, shrinking them and mitigating the impact of multicollinearity. This regularization technique helps stabilize the estimates.