Your linear regression model has a high bias. What could be the reasons behind this, and how would you try to fix it?

High variance in data, Address by using more data
Irrelevant features, Address by using Lasso regression
Oversimplified model, Address by increasing model complexity
Too complex model, Address by reducing model complexity

High bias often stems from an oversimplified model that fails to capture the underlying patterns in the data. Increasing model complexity by adding polynomial terms, interaction terms, or more features can reduce bias and help the model better fit the data.

Discuss it

Imagine you've built a spam email classifier. How would you utilize the Confusion Matrix to understand the model's performance?

Analyze TP, FP, TN, FN to understand the type and frequency of errors
Focus on FP and FN to understand only the mistakes made
Focus only on TP and TN as they represent correct classifications
nan

In spam email classification, a Confusion Matrix helps by showing True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN), thus allowing an understanding of the type and frequency of errors, not just the correct classifications.

Discuss it

When classifying text data, the ________ method can be used to convert text into numerical format for analysis.

Bag-of-Words
Clustering
Normalization
Principal Component Analysis

The Bag-of-Words (BoW) method represents text as a numerical vector where each element corresponds to the frequency or presence of a word in the document. It is commonly used in text classification tasks.

Discuss it

Deep Learning models often require substantial computational resources, such as __________, due to their complexity.

All of the above
CPUs
GPUs
RAM

GPUs (Graphics Processing Units) are particularly used in Deep Learning due to their ability to handle parallel processing, making them suited for the task.

Discuss it

How can one effectively determine the optimal value of K in the KNN algorithm for a given dataset?

Always choosing K=5
Cross-validation
Guessing
Only using an odd value

The optimal value of K can be determined by using cross-validation to test different values and selecting the one that performs best.

Discuss it

You've built a classification model, but it's highly sensitive to changes in the test data. What could be the issue and how would you fix it?

Overfitting; Cross-validation
Overfitting; Increase regularization
Underfitting; Add more features
Underfitting; Use different model

The issue could be overfitting, where the model performs well on training data but poorly on unseen data. Fixing this might involve using cross-validation to ensure the model generalizes well to new data.

Discuss it

In a medical diagnosis scenario, how would you evaluate a model using Precision, Recall, and the ROC Curve? Explain the considerations you would take into account.

Focus equally on Precision and Recall, use ROC for sensitivity
Focus on Precision to minimize false positives, use ROC for specificity
Focus on Recall to minimize false negatives, use ROC for overall trade-off
nan

In medical diagnosis, minimizing false negatives (missing a true condition) is often crucial, so Recall is highly valued. The ROC Curve is used to understand the trade-off between sensitivity and specificity, providing a comprehensive view of the model's performance.

Discuss it

Underfitting occurs when a model is too _________ and fails to capture the underlying trend of the data.

complex
noisy
regularized
simple

Underfitting happens when a model is too simple to capture the underlying patterns in the data, leading to poor predictions.

Discuss it

How is the within-class scatter matrix computed in LDA?

By multiplying the covariances of each class
By multiplying the means of each class
By summing the covariances of each class
By summing the means of each class

The within-class scatter matrix in LDA is computed "by summing the covariances of each class." This matrix captures the spread of data within each class and is essential for minimizing within-class variance.

Discuss it

The _________ in Simple Linear Regression represents the value of the dependent variable when the independent variable is zero.

Coefficient
Intercept
Residual
Slope

The intercept in Simple Linear Regression represents the value of the dependent variable when the independent variable is zero. It's the point where the regression line crosses the Y-axis.

Discuss it