What term is used to refer to a set of input variables and their corresponding target values used to evaluate a Machine Learning model's performance?

  • Evaluation set
  • Testing set
  • Training set
  • Validation set
The "Testing set" is a set of input variables and corresponding target values used to evaluate a Machine Learning model's performance. It helps in assessing how well the model will perform on unseen data.

Explain how the F1-Score is computed and why it is used.

  • Arithmetic mean of Precision and Recall, balances both metrics
  • Geometric mean of Precision and Recall, emphasizes Recall
  • Harmonic mean of Precision and Recall, balances both metrics
  • nan
F1-Score is the harmonic mean of Precision and Recall. It helps balance both metrics, particularly when there's an uneven class distribution. It's often used when both false positives and false negatives are important to minimize.

Why is Bootstrapping an essential technique in statistical analysis?

  • It allows training deep learning models
  • It enables the estimation of the distribution of a statistic
  • It provides a method for feature selection
  • It speeds up computation
Bootstrapping is essential in statistical analysis because it allows estimating the distribution of a statistic, even with a small sample. By repeatedly resampling with replacement, it creates numerous "bootstrap samples," enabling the calculation of standard errors, confidence intervals, and other statistical properties.

What is the role of a decision boundary in classification problems?

  • Separating classes in the feature space
  • Separating data into clusters
  • Separating features
  • Separating training and test data
A decision boundary is a hypersurface that partitions the underlying feature space into classes. It plays a crucial role in determining the class label of a new data point based on which side of the boundary it lies.

What is the impact of pruning on the bias-variance tradeoff in a Decision Tree model?

  • Increases bias, reduces variance
  • Increases both bias and variance
  • Reduces bias, increases variance
  • Reduces both bias and variance
Pruning a Decision Tree leads to a simpler model, which can increase bias but reduce variance. This tradeoff helps to avoid overfitting the training data and often results in a model that generalizes better to unseen data.

How does the Kernel Trick help in dealing with non-linear data in SVM?

  • Enhances data visualization
  • Maps data into higher-dimensional space for linear separation
  • Reduces data size
  • Speeds up computation
The Kernel Trick helps in dealing with non-linear data by mapping it into a higher-dimensional space where it can be linearly separated.

You are given a complex dataset with a large amount of unstructured data. Which among AI, Machine Learning, or Deep Learning would be best suited to analyze this, and why?

  • AI, for its simplicity
  • Deep Learning, for its ability to handle complex and unstructured data
  • Machine Learning, for its structured data analysis
  • nan
Deep Learning models are adept at handling unstructured data and finding complex patterns, making them suitable for such a dataset.

How does choosing the value of K in the K-Nearest Neighbors (KNN) algorithm impact the decision boundary?

  • Both 1 & 2 depending on value
  • Makes it more complex
  • Makes it smoother
  • nan
A smaller K value results in a more complex decision boundary, while a larger K value makes it smoother.

How does the Elbow Method determine the optimal number of clusters, and what are its limitations?

  • By evaluating the model's accuracy
  • By finding the point of maximum curvature on a plot of variance vs. clusters
  • By maximizing the cluster distances
  • By minimizing the inter-cluster distances
The Elbow Method determines the optimal number of clusters by finding the "elbow" point on a plot of variance vs. clusters. Limitations include ambiguity in identifying the exact "elbow" and sensitivity to initialization.

The percentage of total variance explained by a principal component in PCA can be calculated by dividing the Eigenvalue of that component by the ________.

  • magnitude of Eigenvectors
  • number of Eigenvectors
  • number of components
  • sum of all Eigenvalues
The percentage of total variance explained by a principal component is calculated by dividing its Eigenvalue by the "sum of all Eigenvalues." This ratio gives the proportion of the dataset's total variance that is captured by that specific component.