You are given a dataset without clear instructions on what the targets are. How would you proceed to build a predictive model?

Build a regression model directly
Consult with domain experts or analyze the data for insights
Guess the targets
Ignore the data

Consulting with domain experts or analyzing the data through exploratory data analysis (EDA) can help identify potential targets and correlations within the data. This collaborative and investigative approach ensures that the predictive model is aligned with the underlying patterns and relevant subject matter.

Discuss it

In a scenario where your model is consistently achieving mediocre performance on both training and validation data, what might be the underlying problem, and what would be your approach to fix it?

Increase complexity
Overfitting, reduce complexity
Reduce complexity
Underfitting, add complexity

The underlying problem might be underfitting, where the model is too simple to capture the underlying patterns. Increasing the model's complexity would likely improve performance on both training and validation data.

Discuss it

In the context of PCA, the are unit vectors that define the directions of maximum variance, whereas the represent the magnitude of variance in those directions.

Eigenvalues, Eigenvectors
Eigenvectors, Eigenvalues
principal components, Eigenvectors
principal directions, magnitudes

In PCA, the "Eigenvectors" are unit vectors that define the directions of maximum variance in the data, whereas the "Eigenvalues" represent the magnitude of variance in those directions. Together, they form the core mathematical components of PCA.

Discuss it

What does the assumption of linearity imply in Simple Linear Regression?

Both Variables are Categorized
Dependent Variable is Linear
Independent Variable is Linear
Relationship between Dependent and Independent Variables is Linear

The assumption of linearity implies that the relationship between the dependent and independent variables is linear. A non-linear relationship may lead to biased or inefficient estimates.

Discuss it

How can Ridge Regression be used to mitigate multicollinearity in Multiple Linear Regression?

By adding a penalty term to the coefficients
By increasing model complexity
By reducing the number of samples
By removing correlated variables

Ridge Regression adds a penalty term to the coefficients, shrinking them and mitigating the impact of multicollinearity. This regularization technique helps stabilize the estimates.

Discuss it

What is bagging, and how is it related to Random Forest?

Bagging involves combining predictions from multiple models, and Random Forest is an example
Bagging involves using a single strong model
Bagging is a type of boosting
Bagging is unrelated to Random Forest

Bagging (Bootstrap Aggregating) is a method that involves combining predictions from multiple models, each trained on a random subset of the data. Random Forest is a specific example of a bagging algorithm that uses decision trees as the base models.

Discuss it

In Simple Linear Regression, what is the relationship between the dependent and independent variable?

Cubic
Exponential
Linear
Quadratic

In Simple Linear Regression, the relationship between the dependent and independent variable is linear. The model tries to fit a straight line that best describes the relationship.

Discuss it

How are the coefficients of Simple Linear Regression estimated?

By Maximizing the Variance
By Minimizing the Sum of the Squares of the Residuals
Through Classification
Through Clustering

The coefficients in Simple Linear Regression are estimated by minimizing the sum of the squares of the residuals. This method ensures that the line fits as closely as possible to the observed data.

Discuss it

You are using Simple Linear Regression for a time-series dataset, and the residuals show a pattern. What does this imply, and what might be the remedy?

Autocorrelation Present, Use Time-Series Model
Model is Perfect
Multicollinearity, Remove Variables
Normal Distribution, No Remedy Needed

If residuals show a pattern in a time-series dataset, autocorrelation might be present, violating the independence assumption. A time-series model like ARIMA may be a more suitable approach.

Discuss it

Imagine you are working with a dataset where the classes are highly overlapped. How would LDA handle this situation, and what might be the challenges?

LDA would easily separate the classes; no challenges
LDA would ignore the overlap and classify randomly
LDA would require additional data for proper classification
LDA would struggle to separate the classes; potential misclassification

LDA would "struggle to separate the classes" when there's high overlap, as it relies on maximizing between-class variance. The challenges include potential misclassification and decreased accuracy.

Discuss it

You are given a dataset without clear instructions on what the targets are. How would you proceed to build a predictive model?

In a scenario where your model is consistently achieving mediocre performance on both training and validation data, what might be the underlying problem, and what would be your approach to fix it?

In the context of PCA, the ________ are unit vectors that define the directions of maximum variance, whereas the ________ represent the magnitude of variance in those directions.

What does the assumption of linearity imply in Simple Linear Regression?

How can Ridge Regression be used to mitigate multicollinearity in Multiple Linear Regression?

What is bagging, and how is it related to Random Forest?

In Simple Linear Regression, what is the relationship between the dependent and independent variable?

How are the coefficients of Simple Linear Regression estimated?

You are using Simple Linear Regression for a time-series dataset, and the residuals show a pattern. What does this imply, and what might be the remedy?

Imagine you are working with a dataset where the classes are highly overlapped. How would LDA handle this situation, and what might be the challenges?

In the context of PCA, the are unit vectors that define the directions of maximum variance, whereas the represent the magnitude of variance in those directions.