The term ___________ is used to describe a model that performs well on the training data but poorly on the unseen data.

  • Bootstrap
  • Cross-validation
  • Overfitting
  • Underfitting
Overfitting refers to a situation where a model is trained too well on the training data and performs poorly on unseen data because it has learned the noise and specific patterns in the training data, rather than the underlying trend.

In what scenarios might a custom distance metric be needed in KNN, and how would you go about implementing it?

  • When K is very large
  • When data has specific characteristics
  • When data is uniform
  • When using standardized data
A custom distance metric might be needed when data has specific characteristics that require a particular measure of similarity. Implementation involves defining a function that captures these characteristics.

Your classification model's accuracy is high, but precision and recall are not balanced. How would you approach this problem to get a better trade-off?

  • Change the classification threshold; consider using the F1 Score
  • Ignore precision and recall
  • Only focus on accuracy
  • Use a different dataset
Adjusting the classification threshold and considering metrics like the F1 Score, which balances precision and recall, can help achieve a more balanced trade-off between these metrics, leading to a more robust model evaluation.

In Hierarchical Clustering, the _________ linkage method considers the distance between the closest points of two clusters.

  • Average Linkage
  • Complete Linkage
  • Single Linkage
  • Ward's Method
Single Linkage considers the minimum distance between the closest points of two clusters. This can lead to chain-like clusters and is sensitive to noise and outliers. It's useful when we want to identify clusters with irregular shapes.

In a situation where the training accuracy is high but the testing accuracy is low, what could be the issue, and how might you solve it?

  • Model is overfitting
  • Model is underfitting
  • Testing data is too large
  • Training data is too small
Overfitting occurs when a model performs well on the training data but poorly on unseen data. This could be a result of high complexity in the model. Solutions can include using cross-validation, adding regularizations, or simplifying the model by removing unnecessary features or reducing the complexity of the model itself.

The selection of the right number of clusters in K-Means is often done using the _________ method.

  • Centroid
  • Elbow
  • Gap
  • Silhouette
The Elbow method is used to find the optimal number of clusters in K-Means by plotting the variance as a function of the number of clusters and finding the "elbow" point.

Machine Learning is a branch of AI that includes algorithms that learn patterns in data, while Deep Learning is a subset of _________ that involves multi-layered neural networks.

  • AI
  • Deep Learning
  • Machine Learning
  • nan
Deep Learning is a subset of Machine Learning, focusing on algorithms that utilize multi-layered neural networks.

In Machine Learning, models learn from data and make predictions, whereas in Deep Learning, models can automatically learn representations from data through _________.

  • reinforcement learning
  • representation learning
  • supervised learning
  • unsupervised learning
Deep Learning models can automatically learn representations from data through a hierarchy of layers, often referred to as representation learning.

Explain how the coefficients of Simple Linear Regression can be interpreted in terms of correlation.

  • Coefficients Are Independent of Correlation
  • Coefficients Determine Correlation
  • Coefficients Indicate No Correlation
  • Coefficients Represent the Strength and Direction of the Relationship
The coefficients in Simple Linear Regression represent the strength and direction of the relationship between the dependent and independent variables, and they provide information on how changes in one variable are associated with changes in the other.

ElasticNet is a regularized regression method that linearly combines the L1 penalty of _________ and the L2 penalty of _________.

  • Lasso, Ridge
  • Linear, Polynomial
  • Polynomial, Linear
  • Ridge, Lasso
ElasticNet is a regularized regression method that combines the L1 penalty of Lasso and the L2 penalty of Ridge, incorporating the properties of both methods.

Explain the difference between hard and soft classification.

  • Hard provides class labels; soft provides class probabilities
  • Hard requires more data; soft requires less
  • Hard uses algorithms; soft uses manual classification
  • No difference
Hard classification provides specific class labels, whereas soft classification provides probabilities for each class, allowing for more nuanced insights into the confidence of a prediction.

In the context of PCA, what is the role of Eigenvectors?

  • Normalizing the data
  • Representing noise in the data
  • Representing outliers in the data
  • Representing the direction of maximum variance
Eigenvectors in PCA represent the direction of maximum variance in the data. They define the directions along which the original data is projected to create the principal components.