You are given a dataset that is not linearly separable. How would you use SVM with the Kernel Trick to classify the data?

  • Apply a linear kernel only
  • Apply a non-linear kernel to transform the feature space
  • Increase data size
  • Reduce data size
The Kernel Trick with a non-linear kernel (such as RBF) can transform the feature space, making it linearly separable, and thus classify non-linear data.

What is the primary goal of the K-Means Clustering algorithm?

  • All of the Above
  • Maximizing inter-cluster distance
  • Minimizing intra-cluster distance
  • Predicting new data points
The primary goal of K-Means is to minimize the intra-cluster distance, meaning the distance within the same cluster, to make the clusters as tight and well-separated as possible.

In a text classification task, why might you choose a Naive Bayes classifier over a more complex model like a deep learning algorithm?

  • Deep learning is not suitable for text classification
  • Deep learning requires less preprocessing
  • Naive Bayes always outperforms deep learning
  • Naive Bayes might be preferred for its simplicity and efficiency, especially with limited data
Naive Bayes is a probabilistic classifier that can be simpler and more computationally efficient, especially when dealing with small or medium-sized datasets. In contrast, deep learning models might require more data and computational resources.

What do the ROC Curve and AUC represent in classification problems?

  • Curve of false positive rate vs. true positive rate
  • Curve of precision vs. recall
  • Curve of true negatives vs. false negatives
  • nan
The ROC (Receiver Operating Characteristic) Curve is a plot of the false positive rate versus the true positive rate. The AUC (Area Under the Curve) is a single value summarizing the overall ability of the test to discriminate between positive and negative instances.

If the data is linearly separable, using a _________ kernel in SVM will create a linear decision boundary.

  • Linear
  • Polynomial
  • RBF
  • Sigmoid
Using a linear kernel in SVM will create a linear decision boundary when the data is linearly separable.

What is overfitting in the context of machine learning?

  • Enhancing generalization
  • Fitting the model too closely to the training data
  • Fitting the model too loosely to the data
  • Reducing model complexity
Overfitting occurs when a model fits the training data too closely, capturing the noise and outliers, making it perform poorly on unseen data.

How can you determine the degree of the polynomial in Polynomial Regression?

  • By cross-validation or visual inspection
  • By the number of features
  • By the number of observations
  • By the type of problem
The degree of the polynomial in Polynomial Regression can be determined by techniques like cross-validation or visual inspection of the fit. Choosing the right degree helps in balancing the bias-variance trade-off.

An Odds Ratio greater than 1 in Logistic Regression indicates that the __________ of the event increases for each unit increase in the predictor variable.

  • Likelihood
  • Margin
  • Odds
  • Probability
An Odds Ratio greater than 1 indicates that the odds of the event occurring increase for each unit increase in the predictor variable.

Why is it problematic for a model to fit too closely to the training data?

  • It improves generalization
  • It increases model simplicity
  • It leads to poor performance on unseen data
  • It reduces model bias
Fitting too closely to the training data leads to overfitting and poor performance on unseen data, as the model captures noise and fails to generalize well.

Choosing too small a value for K in KNN can lead to a __________ model, while choosing too large a value can lead to a __________ model.

  • fast, slow
  • noisy, smooth
  • slow, fast
  • smooth, noisy
A small K leads to a noisy model as it is sensitive to noise, whereas a large K results in a smooth model due to the averaging effect over more neighbors.