Can you explain the assumptions underlying linear regression?

Independence of features, Normality of target variable, Linearity of relationship, Constant variance
Normal distribution of errors, Linearity of relationship, Independence of residuals, Constant variance
Normality of residuals, Constant variance, Independence of residuals, Linearity of relationship
Normality of residuals, Linearity of relationship, Multicollinearity, Independence of features

Linear regression assumes that the relationship between the dependent and independent variables is linear, errors are normally distributed, residuals are independent, and the variance of residuals is constant across all levels of the independent variables. These assumptions guide the model's performance and interpretation.

Discuss it

What are the limitations of Deep Learning as compared to other Machine Learning techniques?

Easier interpretability and requires more data
More interpretable and less efficient
Requires less data and is more complex
Requires more data and is often less interpretable

Deep Learning typically requires more data for effective training and often results in models that are less interpretable compared to traditional Machine Learning models.

Discuss it

How does clustering differ from classification?

Clustering and Classification are the same
Clustering is supervised; Classification is unsupervised
Clustering is unsupervised; Classification is supervised
Clustering uses regression

Clustering is an unsupervised learning technique that groups similar data points, whereas Classification is a supervised learning technique that assigns predefined labels to instances.

Discuss it

You are working with a medical dataset to predict a particular disease. What ethical considerations must be taken into account when building and deploying this model?

Consider fairness, transparency, privacy, and informed consent
Focus only on achieving high accuracy
Ignore privacy and consent
Ignore the potential biases in the data

Ethical considerations in medical predictions include ensuring fairness (avoiding biases), transparency (explainability), privacy (protecting sensitive information), and obtaining informed consent from the patients.

Discuss it

A business stakeholder asks you to explain the interaction effect found in a Multiple Linear Regression model built for sales prediction. How would you explain this in non-technical terms?

Explain that one variable's effect depends on another variable
Ignore the question
Provide raw data
Use technical jargon

You could explain the interaction effect by stating that the effect of one variable on sales depends on the level of another variable. For example, the effect of advertising on sales might depend on the season, and the interaction term captures this dependency in the model.

Discuss it

How do you assess the fit of a Logistic Regression model?

Accuracy only
Precision and recall only
R-squared only
Using metrics such as AUC-ROC, confusion matrix, log-likelihood, etc.

The fit of a Logistic Regression model can be assessed using various metrics, including the AUC-ROC curve, confusion matrix, log-likelihood, and other classification metrics that consider both the positive and negative classes.

Discuss it

How is the R-Squared value used in assessing the performance of a regression model?

Measures the error variance
Measures the explained variance ratio
Measures the model's complexity
Measures the total sum of squares

The R-Squared value, also known as the coefficient of determination, measures the ratio of the explained variance to the total variance. It provides a statistical measure of how well the regression line approximates the real data points, with a value between 0 and 1. A higher R-Squared value indicates that more of the variance is captured by the model.

Discuss it

How does linear regression differ from nonlinear regression?

They differ in the accuracy of predictions
They differ in the complexity of the model
They differ in the number of outputs
They differ in the number of variables used

Linear regression assumes a linear relationship between the dependent and independent variables, while nonlinear regression can model more complex relationships that are not strictly linear.

Discuss it

Explain the Variance Inflation Factor (VIF) and its role in detecting multicollinearity.

Measure of how much the variance of an estimated coefficient increases when predictors are correlated
Measure of model complexity
Measure of model's fit
Measure of residual errors

VIF quantifies how much the variance of an estimated regression coefficient increases when predictors are correlated. A high VIF indicates multicollinearity, potentially affecting the model's stability.

Discuss it

Explain how the ElasticNet regression combines the properties of Ridge and Lasso regression.

By alternating between L1 and L2 regularization
By using a weighted average of L1 and L2
By using both L1 and L2 regularization
By using neither L1 nor L2 regularization

ElasticNet regression combines the properties of Ridge and Lasso by using both L1 and L2 regularization. This hybrid approach combines Lasso's ability to perform feature selection with Ridge's ability to handle multicollinearity, providing a balance that can be fine-tuned using hyperparameters.

Discuss it

How is Deep Learning different from traditional Machine Learning techniques?

Deep Learning focuses on neural networks with multiple layers
Deep Learning requires less data
Deep Learning uses shallower models
Deep Learning uses simpler algorithms

Deep Learning differs from traditional Machine Learning by using neural networks with multiple layers, enabling the analysis of more complex patterns.

Discuss it

How does hyperparameter tuning influence the performance of a classification model?

Enhances model performance by fine-tuning algorithm parameters
Increases computational time but doesn't affect performance
Makes the model simpler
No influence

Hyperparameter tuning involves finding the optimal hyperparameters (e.g., learning rate, regularization strength) for a given model and data. This fine-tuning process helps in enhancing the model's performance by finding the best configuration for the learning algorithm.

Discuss it