The term ___________ is used to describe a model that performs well on the training data but poorly on the unseen data.

Bootstrap
Cross-validation
Overfitting
Underfitting

Overfitting refers to a situation where a model is trained too well on the training data and performs poorly on unseen data because it has learned the noise and specific patterns in the training data, rather than the underlying trend.

Discuss it

In what scenarios might a custom distance metric be needed in KNN, and how would you go about implementing it?

When K is very large
When data has specific characteristics
When data is uniform
When using standardized data

A custom distance metric might be needed when data has specific characteristics that require a particular measure of similarity. Implementation involves defining a function that captures these characteristics.

Discuss it

Your classification model's accuracy is high, but precision and recall are not balanced. How would you approach this problem to get a better trade-off?

Change the classification threshold; consider using the F1 Score
Ignore precision and recall
Only focus on accuracy
Use a different dataset

Adjusting the classification threshold and considering metrics like the F1 Score, which balances precision and recall, can help achieve a more balanced trade-off between these metrics, leading to a more robust model evaluation.

Discuss it

In Hierarchical Clustering, the _________ linkage method considers the distance between the closest points of two clusters.

Average Linkage
Complete Linkage
Single Linkage
Ward's Method

Single Linkage considers the minimum distance between the closest points of two clusters. This can lead to chain-like clusters and is sensitive to noise and outliers. It's useful when we want to identify clusters with irregular shapes.

Discuss it

In a situation where the training accuracy is high but the testing accuracy is low, what could be the issue, and how might you solve it?

Model is overfitting
Model is underfitting
Testing data is too large
Training data is too small

Overfitting occurs when a model performs well on the training data but poorly on unseen data. This could be a result of high complexity in the model. Solutions can include using cross-validation, adding regularizations, or simplifying the model by removing unnecessary features or reducing the complexity of the model itself.

Discuss it

The selection of the right number of clusters in K-Means is often done using the _________ method.

Centroid
Elbow
Gap
Silhouette

The Elbow method is used to find the optimal number of clusters in K-Means by plotting the variance as a function of the number of clusters and finding the "elbow" point.

Discuss it

Machine Learning is a branch of AI that includes algorithms that learn patterns in data, while Deep Learning is a subset of _________ that involves multi-layered neural networks.

AI
Deep Learning
Machine Learning
nan

Deep Learning is a subset of Machine Learning, focusing on algorithms that utilize multi-layered neural networks.

Discuss it

In Machine Learning, models learn from data and make predictions, whereas in Deep Learning, models can automatically learn representations from data through _________.

reinforcement learning
representation learning
supervised learning
unsupervised learning

Deep Learning models can automatically learn representations from data through a hierarchy of layers, often referred to as representation learning.

Discuss it

Explain how the coefficients of Simple Linear Regression can be interpreted in terms of correlation.

Coefficients Are Independent of Correlation
Coefficients Determine Correlation
Coefficients Indicate No Correlation
Coefficients Represent the Strength and Direction of the Relationship

The coefficients in Simple Linear Regression represent the strength and direction of the relationship between the dependent and independent variables, and they provide information on how changes in one variable are associated with changes in the other.

Discuss it

ElasticNet is a regularized regression method that linearly combines the L1 penalty of _ and the L2 penalty of _.

Lasso, Ridge
Linear, Polynomial
Polynomial, Linear
Ridge, Lasso

ElasticNet is a regularized regression method that combines the L1 penalty of Lasso and the L2 penalty of Ridge, incorporating the properties of both methods.

Discuss it

Explain the difference between hard and soft classification.

Hard provides class labels; soft provides class probabilities
Hard requires more data; soft requires less
Hard uses algorithms; soft uses manual classification
No difference

Hard classification provides specific class labels, whereas soft classification provides probabilities for each class, allowing for more nuanced insights into the confidence of a prediction.

Discuss it

In the context of PCA, what is the role of Eigenvectors?

Normalizing the data
Representing noise in the data
Representing outliers in the data
Representing the direction of maximum variance

Eigenvectors in PCA represent the direction of maximum variance in the data. They define the directions along which the original data is projected to create the principal components.

Discuss it

The term ___________ is used to describe a model that performs well on the training data but poorly on the unseen data.

In what scenarios might a custom distance metric be needed in KNN, and how would you go about implementing it?

Your classification model's accuracy is high, but precision and recall are not balanced. How would you approach this problem to get a better trade-off?

In Hierarchical Clustering, the _________ linkage method considers the distance between the closest points of two clusters.

In a situation where the training accuracy is high but the testing accuracy is low, what could be the issue, and how might you solve it?

The selection of the right number of clusters in K-Means is often done using the _________ method.

Machine Learning is a branch of AI that includes algorithms that learn patterns in data, while Deep Learning is a subset of _________ that involves multi-layered neural networks.

In Machine Learning, models learn from data and make predictions, whereas in Deep Learning, models can automatically learn representations from data through _________.

Explain how the coefficients of Simple Linear Regression can be interpreted in terms of correlation.

ElasticNet is a regularized regression method that linearly combines the L1 penalty of _________ and the L2 penalty of _________.

Explain the difference between hard and soft classification.

In the context of PCA, what is the role of Eigenvectors?

ElasticNet is a regularized regression method that linearly combines the L1 penalty of _ and the L2 penalty of _.