In DBSCAN, what does the term 'Epsilon' refer to?

  • Edge Distance
  • Error Rate
  • Estimated Density
  • Maximum Radius of the Neighborhood
In DBSCAN, 'Epsilon' refers to the maximum radius of the neighborhood around a data point. If there are enough points within this radius (defined by MinPts), the point is considered a core point, leading to the formation of a cluster. It's a critical parameter affecting the clustering result, controlling how close points must be to form a cluster.

You've detected a high Variance Inflation Factor (VIF) for one of the variables in your Multiple Linear Regression model. What does this indicate, and how would you proceed?

  • High multicollinearity and consider removing or combining variables
  • Low multicollinearity
  • No multicollinearity
  • The variable is not significant
A high VIF indicates high multicollinearity, meaning the variable is highly correlated with other variables in the model. You may consider removing or combining variables, applying regularization, or using dimensionality reduction techniques to address this issue and improve the model's performance.

What is the primary purpose of using Cross-Validation in Machine Learning?

  • To enhance the model's complexity
  • To estimate the model's performance on unseen data
  • To increase the training speed
  • To select optimal hyperparameters
Cross-Validation's primary purpose is to estimate the model's performance on unseen data by dividing the dataset into training and validation sets. It provides a more reliable evaluation than using a single static validation set.

You're building a recommendation system without access to labeled data. How would you proceed using unsupervised learning techniques?

  • Combining labeled and unlabeled data
  • Employing labeled data
  • Using clustering methods
  • Using reinforcement strategies
Clustering methods are a common approach in Unsupervised Learning to group data based on similarities, suitable for recommendation systems without labeled data.

A company wants to predict customer churn based on historical data. What considerations must be made in selecting and tuning a Machine Learning model for this task?

  • Considering the business context, available data, model interpretability, and performance metrics
  • Focusing only on accuracy
  • Ignoring feature engineering
  • Selecting the most complex model available
Predicting customer churn requires understanding the business context, the nature of the data, and the need for model interpretability. Metrics such as precision, recall, and F1-score might be more relevant than mere accuracy.

What is the broad field of study that encompasses Machine Learning, Deep Learning, and other computational techniques to enable intelligent decision-making?

  • Artificial Intelligence
  • Computational Science
  • Data Mining
  • Deep Learning
Artificial Intelligence (AI) is the broad field that includes Machine Learning, Deep Learning, and other techniques aimed at creating intelligent systems.

The term _________ refers to a situation where a regression model fits the training data too closely, resulting in poor performance on new data.

  • Bias
  • Overfitting
  • Regularization
  • Underfitting
Overfitting refers to a situation where a regression model fits the training data too closely, capturing noise and resulting in poor performance on unseen data.

What are the potential challenges in determining the optimal values for Epsilon and MinPts in DBSCAN?

  • Difficulty in selecting values that balance density and granularity of clusters
  • Lack of robustness to noise
  • Limited flexibility in shapes
  • Risk of overfitting the data
Determining optimal values for Epsilon and MinPts in DBSCAN is challenging as it requires a careful balance between the density and granularity of clusters. Too large Epsilon can merge clusters, while too small can create many tiny clusters. Selecting MinPts affects the required density, making this tuning a complex task.

In comparison to PCA, LDA focuses on maximizing the separability between different ___________ rather than the variance of the data.

  • classes
  • features
  • principal components
  • variables
Unlike PCA, which focuses on the variance of the data, LDA emphasizes maximizing the separability between "different classes."

Describe a situation where a high Accuracy might be misleading, and a different metric (e.g., Precision, Recall, or F1-Score) might be more appropriate.

  • When the dataset has equal classes, Precision is more appropriate
  • When the dataset has only one class, Recall is more appropriate
  • When the dataset is imbalanced, other metrics like Precision or Recall may be more informative
  • nan
In imbalanced datasets, where one class significantly outnumbers the other, Accuracy can be misleading. Even a naive model predicting the majority class will have high Accuracy. Metrics like Precision, Recall, or F1-Score provide more insight into the model's performance on the minority class.