The addition of _________ in the loss function is a common technique to regularize the model and prevent overfitting.

  • bias
  • learning rate
  • regularization terms
  • weights
Regularization terms (like L1 or L2 penalties) in the loss function constrain the model, reducing the risk of overfitting by preventing large weights.

You have trained an SVM but the decision boundary is not fitting well to the data. How could adjusting the hyperplane parameters help?

  • Change the kernel's color
  • Increase the size of the hyperplane
  • Modify the regularization parameter 'C'
  • Reduce the number of support vectors
Adjusting the regularization parameter 'C' controls the trade-off between margin maximization and error minimization, helping to fit the decision boundary better.

Discuss the difference between Euclidean distance and Manhattan distance metrics in the context of KNN.

  • Euclidean is faster, Manhattan is more accurate
  • Euclidean is for 3D, Manhattan for 2D
  • Euclidean is for continuous data, Manhattan for categorical
  • Euclidean uses squares, Manhattan uses absolutes
Euclidean distance is the square root of the sum of squared differences, while Manhattan distance is the sum of the absolute differences.

A dataset with very high between-class variance but low within-class variance is given. How would the LDA approach be beneficial here?

  • LDA would be the same as PCA
  • LDA would perform optimally due to the variance characteristics
  • LDA would perform poorly
  • LDA would require transformation of the dataset
LDA would "perform optimally" in this scenario, as high between-class variance and low within-class variance align perfectly with its objective of maximizing between-class variance and minimizing within-class variance.

What is the risk of using the same data for both training and testing in a Machine Learning model?

  • Increase in accuracy; Reduction in bias
  • Increase in complexity; Reduction in training time
  • Reduction in training time; Increase in bias
  • Risk of overfitting; Unrealistic performance estimates
Using the same data for training and testing leads to the risk of overfitting and provides unrealistic performance estimates. The model will have seen all the data during training, so it might not generalize well to new, unseen instances.

Can you explain the complete linkage method in Hierarchical Clustering?

  • Using maximum distance between any two points in clusters
  • Using mean distance between all pairs in clusters
  • Using minimum distance between any two points in clusters
  • Using total distance between all points in clusters
The complete linkage method in Hierarchical Clustering uses the maximum distance between any two points in the clusters to determine the linkage. It ensures that clusters are as compact as possible by focusing on the farthest points, which can sometimes lead to chain-like clusters.

What are the potential drawbacks of using PCA for dimensionality reduction?

  • It always improves model performance
  • It can lead to information loss and doesn't consider class labels
  • It normalizes the variance of the data
  • It removes all noise and outliers
The potential drawbacks of using PCA include the risk of information loss since it only considers variance, not class labels, and might remove meaningful information that doesn't align with the directions of maximum variance.

In a real-world customer segmentation problem, how might you apply clustering to optimize marketing strategies?

  • All of the Above
  • By Clustering Based on Behavior
  • By Clustering Based on Geography
  • By Clustering Based on Product Preference
Clustering can be applied in various ways to optimize marketing strategies, including grouping customers based on product preference, geography, behavior, or a combination of these factors.

Clustering is a common task in __________ learning, where data is grouped based on inherent similarities without the use of labels.

  • reinforcement
  • semi-supervised
  • supervised
  • unsupervised
Unsupervised learning commonly involves clustering, where data is grouped based on similarities without using labels.

What is the primary difference between the Gini Index and entropy when used in Decision Trees?

  • Calculation Method
  • Complexity
  • Scale
  • Units
Gini Index and entropy are both used to measure purity, but they are calculated differently. Entropy uses logarithms, while Gini Index does not.