Explain the concept of k-fold Cross-Validation. What does "k" signify?

  • Number of equally-sized folds the data is divided into
  • Number of features in the dataset
  • Number of iterations in training
  • Number of layers in a deep learning model
In k-fold Cross-Validation, "k" signifies the number of equally-sized folds the data is divided into. The model is trained on (k-1) folds and validated on the remaining fold, repeating this process k times. The average performance across all k trials provides a more unbiased estimate of the model's capability.

What is the main difference between supervised and unsupervised learning?

  • Application
  • Complexity
  • Data size
  • Use of labeled data
The main difference is the use of labeled data. Supervised Learning uses labeled data, while Unsupervised Learning does not.

What is the difference between training and testing datasets in Machine Learning?

  • Training for clustering; Testing for regression
  • Training for labeling; Testing for predicting
  • Training used to evaluate; Testing used to predict
  • Training used to learn patterns; Testing used to evaluate performance
In Machine Learning, the training dataset is used for the model to learn patterns, and the testing dataset is used to evaluate the model's performance on unseen data.

You have two very similar clusters in your dataset that DBSCAN is not separating well. What might be the problem and how could you resolve it?

  • Increase Epsilon; Decrease MinPts
  • Increase Epsilon; Increase MinPts
  • Reduce Epsilon; Keep MinPts the same
  • Reduce both Epsilon and MinPts
If DBSCAN is not separating two very similar clusters well, it may be due to the Epsilon being too large, causing the clusters to merge. Reducing Epsilon while keeping MinPts the same can make the algorithm more sensitive to slight differences, allowing it to differentiate between the similar clusters.

In K-Means clustering, the initial placement of centroids can be done using the _________ method, among others.

  • K-Means++
  • Mean Shift
  • Random
  • Silhouette
The K-Means++ method is commonly used for the initialization of centroids in K-Means clustering. It helps in faster convergence and reduces the risk of local minima by selecting initial centroids in a smarter way.

When the assumptions of normality and homogeneity of variances are violated, LDA may provide ___________ results.

  • biased
  • consistent
  • optimal
  • suboptimal
If the assumptions of normality and homogeneity of variances are violated, LDA may provide "suboptimal" results, affecting its effectiveness in separating classes.

_________ learning is a type of Machine Learning where the model learns by interacting with an environment to achieve a goal.

  • Reinforcement
  • Semi-supervised
  • Supervised
  • Unsupervised
Reinforcement learning is a type of learning where an agent learns to make decisions by interacting with an environment, receiving feedback in the form of rewards or penalties.

The Gini Index in a Decision Tree aims to minimize the probability of __________.

  • Misclassification
  • Optimization
  • Overfitting
  • Underfitting
The Gini Index in a Decision Tree aims to minimize the probability of misclassification. It quantifies how often a randomly chosen element from the set would be incorrectly labeled, guiding the best splits in the tree.

You trained a model that performs exceptionally well on the training data but poorly on the test data. What could be the issue, and how would you address it?

  • Increase complexity
  • Increase dataset size
  • Overfitting, add regularization
  • Reduce complexity
The issue is likely overfitting, where the model has learned the training data too well, including its noise and anomalies. Adding regularization would help to constrain the model and make it generalize better to unseen data.

What is the concept of Polynomial Regression?

  • Linear equation with multiple variables
  • Linear equation with one variable
  • Non-linear equation using polynomial features
  • Non-linear equation with one variable
Polynomial Regression is a form of regression analysis where the relationship between the independent variable and the dependent variable is modeled as a polynomial. It allows for more complex relationships by including polynomial terms in the regression equation.

Your linear regression model has a high bias. What could be the reasons behind this, and how would you try to fix it?

  • High variance in data, Address by using more data
  • Irrelevant features, Address by using Lasso regression
  • Oversimplified model, Address by increasing model complexity
  • Too complex model, Address by reducing model complexity
High bias often stems from an oversimplified model that fails to capture the underlying patterns in the data. Increasing model complexity by adding polynomial terms, interaction terms, or more features can reduce bias and help the model better fit the data.

Describe the role of hyperparameter tuning in the performance of a Machine Learning model.

  • It adjusts the weights during training
  • It optimizes the model's parameters before training
  • It optimizes the values of hyperparameters to improve the model's performance
  • It selects the type of model to be used
Hyperparameter tuning involves optimizing the values of hyperparameters (parameters set before training) to improve the model's performance. It helps in finding the best combination of hyperparameters that provides optimal performance for a given dataset.