What is the mathematical relationship between Eigenvalues and Eigenvectors in PCA?

  • Eigenvalues are scalar multiples of eigenvectors
  • They are inversely related
  • They are the same
  • They are unrelated
In PCA, eigenvalues and eigenvectors have a mathematical relationship where the eigenvalues are scalar multiples of the eigenvectors. They form the eigenvalue-eigenvector equation for the covariance matrix.

What could be the possible consequence of choosing a very small value of K in the KNN algorithm?

  • Increased efficiency
  • Overfitting
  • Reduced complexity
  • Underfitting
Choosing a very small value of K in the KNN algorithm can lead to overfitting, where the model becomes too sensitive to noise in the training data.

What type of problems is Logistic Regression mainly used to solve?

  • Binary classification problems
  • Clustering problems
  • Regression problems
  • Unsupervised learning problems
Logistic Regression is mainly used to solve binary classification problems, where the goal is to classify instances into one of two classes.

When interpreting a dendrogram in Hierarchical Clustering, the height of the _________ represents the distance at which clusters are merged.

  • Branches
  • Leaves
  • Lines
  • Nodes
In a dendrogram, the height of the branches represents the distance at which clusters are merged. The higher the branch, the greater the distance, indicating that the clusters being merged are less similar. This information can guide the selection of the number of clusters and provides insights into the underlying structure of the data.

A business stakeholder wants to use a very high-degree Polynomial Regression for forecasting, arguing that it fits the historical data perfectly. How would you explain the risks of this approach and suggest a more robust method?

  • Encourage the high-degree approach
  • Explain the risk of overfitting and suggest using cross-validation or regularization
  • Focus only on training data
  • Ignore the stakeholder's suggestion
The high-degree approach is prone to overfitting and may not generalize well to future data. Explaining this risk and suggesting more robust methods such as cross-validation or regularization can help in building a more reliable forecasting model.

In classification, the ________ metric is often used to evaluate the balance between precision and recall.

  • Accuracy
  • F1 Score
  • Mean Squared Error
  • R-squared
The F1 Score is the harmonic mean of precision and recall, providing a single metric that balances the trade-off between these two important metrics.

Your classification model's accuracy is high, but precision and recall are not balanced. How would you approach this problem to get a better trade-off?

  • Change the classification threshold; consider using the F1 Score
  • Ignore precision and recall
  • Only focus on accuracy
  • Use a different dataset
Adjusting the classification threshold and considering metrics like the F1 Score, which balances precision and recall, can help achieve a more balanced trade-off between these metrics, leading to a more robust model evaluation.

In what scenarios might a custom distance metric be needed in KNN, and how would you go about implementing it?

  • When K is very large
  • When data has specific characteristics
  • When data is uniform
  • When using standardized data
A custom distance metric might be needed when data has specific characteristics that require a particular measure of similarity. Implementation involves defining a function that captures these characteristics.

The term ___________ is used to describe a model that performs well on the training data but poorly on the unseen data.

  • Bootstrap
  • Cross-validation
  • Overfitting
  • Underfitting
Overfitting refers to a situation where a model is trained too well on the training data and performs poorly on unseen data because it has learned the noise and specific patterns in the training data, rather than the underlying trend.

Which regularization technique adds L1 penalty, causing some coefficients to be exactly zero?

  • Elastic Net
  • Lasso
  • Ridge
  • nan
Lasso regularization adds an L1 penalty, causing some of the coefficients to become exactly zero, effectively removing those features from the model.