Can you explain the assumptions underlying linear regression?

  • Independence of features, Normality of target variable, Linearity of relationship, Constant variance
  • Normal distribution of errors, Linearity of relationship, Independence of residuals, Constant variance
  • Normality of residuals, Constant variance, Independence of residuals, Linearity of relationship
  • Normality of residuals, Linearity of relationship, Multicollinearity, Independence of features
Linear regression assumes that the relationship between the dependent and independent variables is linear, errors are normally distributed, residuals are independent, and the variance of residuals is constant across all levels of the independent variables. These assumptions guide the model's performance and interpretation.

What are the limitations of Deep Learning as compared to other Machine Learning techniques?

  • Easier interpretability and requires more data
  • More interpretable and less efficient
  • Requires less data and is more complex
  • Requires more data and is often less interpretable
Deep Learning typically requires more data for effective training and often results in models that are less interpretable compared to traditional Machine Learning models.

How does clustering differ from classification?

  • Clustering and Classification are the same
  • Clustering is supervised; Classification is unsupervised
  • Clustering is unsupervised; Classification is supervised
  • Clustering uses regression
Clustering is an unsupervised learning technique that groups similar data points, whereas Classification is a supervised learning technique that assigns predefined labels to instances.

You are working with a medical dataset to predict a particular disease. What ethical considerations must be taken into account when building and deploying this model?

  • Consider fairness, transparency, privacy, and informed consent
  • Focus only on achieving high accuracy
  • Ignore privacy and consent
  • Ignore the potential biases in the data
Ethical considerations in medical predictions include ensuring fairness (avoiding biases), transparency (explainability), privacy (protecting sensitive information), and obtaining informed consent from the patients.

A business stakeholder asks you to explain the interaction effect found in a Multiple Linear Regression model built for sales prediction. How would you explain this in non-technical terms?

  • Explain that one variable's effect depends on another variable
  • Ignore the question
  • Provide raw data
  • Use technical jargon
You could explain the interaction effect by stating that the effect of one variable on sales depends on the level of another variable. For example, the effect of advertising on sales might depend on the season, and the interaction term captures this dependency in the model.

How do you assess the fit of a Logistic Regression model?

  • Accuracy only
  • Precision and recall only
  • R-squared only
  • Using metrics such as AUC-ROC, confusion matrix, log-likelihood, etc.
The fit of a Logistic Regression model can be assessed using various metrics, including the AUC-ROC curve, confusion matrix, log-likelihood, and other classification metrics that consider both the positive and negative classes.

How is the R-Squared value used in assessing the performance of a regression model?

  • Measures the error variance
  • Measures the explained variance ratio
  • Measures the model's complexity
  • Measures the total sum of squares
The R-Squared value, also known as the coefficient of determination, measures the ratio of the explained variance to the total variance. It provides a statistical measure of how well the regression line approximates the real data points, with a value between 0 and 1. A higher R-Squared value indicates that more of the variance is captured by the model.

How does linear regression differ from nonlinear regression?

  • They differ in the accuracy of predictions
  • They differ in the complexity of the model
  • They differ in the number of outputs
  • They differ in the number of variables used
Linear regression assumes a linear relationship between the dependent and independent variables, while nonlinear regression can model more complex relationships that are not strictly linear.

Explain the Variance Inflation Factor (VIF) and its role in detecting multicollinearity.

  • Measure of how much the variance of an estimated coefficient increases when predictors are correlated
  • Measure of model complexity
  • Measure of model's fit
  • Measure of residual errors
VIF quantifies how much the variance of an estimated regression coefficient increases when predictors are correlated. A high VIF indicates multicollinearity, potentially affecting the model's stability.

Explain how the ElasticNet regression combines the properties of Ridge and Lasso regression.

  • By alternating between L1 and L2 regularization
  • By using a weighted average of L1 and L2
  • By using both L1 and L2 regularization
  • By using neither L1 nor L2 regularization
ElasticNet regression combines the properties of Ridge and Lasso by using both L1 and L2 regularization. This hybrid approach combines Lasso's ability to perform feature selection with Ridge's ability to handle multicollinearity, providing a balance that can be fine-tuned using hyperparameters.

How is Deep Learning different from traditional Machine Learning techniques?

  • Deep Learning focuses on neural networks with multiple layers
  • Deep Learning requires less data
  • Deep Learning uses shallower models
  • Deep Learning uses simpler algorithms
Deep Learning differs from traditional Machine Learning by using neural networks with multiple layers, enabling the analysis of more complex patterns.

How does hyperparameter tuning influence the performance of a classification model?

  • Enhances model performance by fine-tuning algorithm parameters
  • Increases computational time but doesn't affect performance
  • Makes the model simpler
  • No influence
Hyperparameter tuning involves finding the optimal hyperparameters (e.g., learning rate, regularization strength) for a given model and data. This fine-tuning process helps in enhancing the model's performance by finding the best configuration for the learning algorithm.