How does the curse of dimensionality impact the K-Nearest Neighbors algorithm, and what are some ways to address this issue?

  • Enhances speed, addressed by increasing data size
  • Improves accuracy, addressed by adding more dimensions
  • Makes distance measures less meaningful, addressed by dimension reduction
  • Reduces accuracy, addressed by increasing K
The curse of dimensionality can make distance measures less meaningful in KNN, and this issue can be addressed through dimensionality reduction techniques like PCA.

How does Random Forest differ from a single decision tree?

  • Random Forest always performs worse
  • Random Forest focuses on one feature
  • Random Forest uses multiple trees and averages their predictions
  • Random Forest uses only one tree
Random Forest is an ensemble method that builds multiple decision trees and averages their predictions. Unlike a single decision tree, it typically offers higher accuracy and robustness by reducing overfitting through the combination of multiple trees' predictions.

You want to apply clustering to reduce the dimensionality of a dataset, but you also need to interpret the clusters easily. What approaches would you consider?

  • All of the Above
  • Hierarchical Clustering
  • K-Means
  • PCA with Clustering
Applying PCA (Principal Component Analysis) with clustering helps in reducing dimensionality while keeping the clusters interpretable, as PCA provides clear directions for the main sources of variance in the data.

In a case where sparsity is important and you have highly correlated variables, which regularization technique might be most appropriate?

  • ElasticNet
  • Lasso
  • Ridge
  • nan
ElasticNet combines the properties of Ridge and Lasso, making it suitable for handling both sparsity and multicollinearity in the dataset.

What are the criteria for a point to be considered a core point in DBSCAN?

  • Being isolated from other clusters
  • Being the central point of a cluster
  • Being within Epsilon of at least MinPts other points
  • Having the minimum distance to all other points in a cluster
A point is considered a core point in DBSCAN if it has at least MinPts other points within its Epsilon neighborhood radius. This means it's part of a dense region and is central to the formation of a cluster, connecting other core or border points.

In Logistic Regression, what function is used to model the probability of the dependent variable?

  • Exponential function
  • Linear function
  • Polynomial function
  • Sigmoid function
Logistic Regression uses the Sigmoid function to model the probability of the dependent variable. It maps any input into a value between 0 and 1, which is ideal for binary classification.

Which type of regression would be suitable for predicting a continuous output?

  • Cluster Regression
  • K-Nearest Neighbors
  • Linear Regression
  • Logistic Regression
Linear Regression is suitable for predicting a continuous output, as it models the relationship between dependent and independent variables through a linear equation.

How does the use of the Gini Index compare to entropy in terms of computational efficiency in building a Decision Tree?

  • Both are equally efficient
  • Entropy is more computationally efficient
  • Gini Index is more computationally efficient
  • Neither is efficient
Gini Index is more computationally efficient because it does not involve calculating logarithms like entropy does. Although they often produce similar results, the Gini Index is generally preferred when computational resources are limited.

The finance sector leverages Machine Learning for ____________ detection and risk management.

  • Disease Prediction
  • Fraud
  • Recommender Systems
  • Traffic Flow
In the finance sector, Machine Learning is used for Fraud Detection and managing various risks, analyzing transaction data and identifying suspicious activities.

What are the components of a Confusion Matrix, and how do they relate to the True Positive, False Positive, True Negative, and False Negative rates?

  • TP, FN, FP, TN, associated with model accuracy
  • TP, FP, FN, TN, associated with specific classes
  • TP, FP, TN, FN, associated with different error types
  • nan
A Confusion Matrix consists of True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN). They help in understanding the type of mistakes a classifier is making, providing insight into the model's ability to classify instances of specific classes.

Explain the significance of choosing different linkage methods in the outcome of a Hierarchical Clustering algorithm.

  • Different linkage methods affect the shape and size of clusters
  • Different linkage methods affect the speed of clustering only
  • Different linkage methods affect the type of data that can be clustered
  • Different linkage methods yield similar results
Different linkage methods in Hierarchical Clustering significantly affect the shape and size of the resulting clusters. For example, single linkage may create chain-like clusters, complete linkage may lead to compact clusters, and average linkage often results in more balanced clusters. The choice of linkage method should be guided by the underlying data characteristics.

In what way does Machine Learning contribute to the field of autonomous driving?

  • Enabling Real-time Decision-making
  • Financial Fraud Detection
  • Recommending Products
  • Weather Prediction
Machine Learning contributes to autonomous driving by enabling real-time decision-making, recognizing objects, and processing vast amounts of data to navigate safely.