How is the amount of variance explained calculated in PCA?

  • By dividing each eigenvalue by the sum of all eigenvalues
  • By multiplying the eigenvalues with the mean
  • By summing all eigenvalues
  • By taking the square root of the eigenvalues
The amount of variance explained by each principal component in PCA is calculated by dividing the corresponding eigenvalue by the sum of all eigenvalues, and often expressed as a percentage.

You're working with a dataset that has clusters of various shapes and densities. Which clustering algorithm would be best suited for this, and why?

  • DBSCAN
  • Hierarchical Clustering
  • K-Means
  • Mean Shift
DBSCAN is best suited for clusters of various shapes and densities, as it's a density-based clustering method and doesn't rely on spherical assumptions about the data.

How do hyperplanes differ in hard-margin SVM and soft-margin SVM?

  • Color difference
  • Difference in dimensionality
  • Difference in size
  • Flexibility in handling misclassifications
Hard-margin SVM does not allow any misclassifications, while soft-margin SVM provides flexibility in handling misclassifications.

How are rewards and penalties used to guide the learning process in reinforcement learning?

  • To group data based on similarities
  • To guide the agent's actions
  • To label the data
  • To reduce complexity
In reinforcement learning, rewards and penalties guide the agent's actions, encouraging beneficial behaviors and discouraging detrimental ones.

The __________ function in Logistic Regression models the log odds of the probability of the dependent event.

  • Linear
  • Logit
  • Polynomial
  • Sigmoid
The Logit function in Logistic Regression models the log odds of the probability of the dependent event occurring.

What are the potential challenges in determining the optimal values for Epsilon and MinPts in DBSCAN?

  • Difficulty in selecting values that balance density and granularity of clusters
  • Lack of robustness to noise
  • Limited flexibility in shapes
  • Risk of overfitting the data
Determining optimal values for Epsilon and MinPts in DBSCAN is challenging as it requires a careful balance between the density and granularity of clusters. Too large Epsilon can merge clusters, while too small can create many tiny clusters. Selecting MinPts affects the required density, making this tuning a complex task.

In comparison to PCA, LDA focuses on maximizing the separability between different ___________ rather than the variance of the data.

  • classes
  • features
  • principal components
  • variables
Unlike PCA, which focuses on the variance of the data, LDA emphasizes maximizing the separability between "different classes."

What is the impact of pruning on the bias-variance tradeoff in a Decision Tree model?

  • Increases bias, reduces variance
  • Increases both bias and variance
  • Reduces bias, increases variance
  • Reduces both bias and variance
Pruning a Decision Tree leads to a simpler model, which can increase bias but reduce variance. This tradeoff helps to avoid overfitting the training data and often results in a model that generalizes better to unseen data.

How does the Kernel Trick help in dealing with non-linear data in SVM?

  • Enhances data visualization
  • Maps data into higher-dimensional space for linear separation
  • Reduces data size
  • Speeds up computation
The Kernel Trick helps in dealing with non-linear data by mapping it into a higher-dimensional space where it can be linearly separated.

The F1-Score is the harmonic mean of _________ and _________.

  • Accuracy, Recall
  • Precision, Recall
  • Precision, Specificity
  • nan
The F1-Score is the harmonic mean of Precision and Recall. It gives equal weight to both these metrics, providing a balance between the ability to correctly identify positive cases and avoid false positives.