What is the primary purpose of using Cross-Validation in Machine Learning?
- To enhance the model's complexity
- To estimate the model's performance on unseen data
- To increase the training speed
- To select optimal hyperparameters
Cross-Validation's primary purpose is to estimate the model's performance on unseen data by dividing the dataset into training and validation sets. It provides a more reliable evaluation than using a single static validation set.
You're building a recommendation system without access to labeled data. How would you proceed using unsupervised learning techniques?
- Combining labeled and unlabeled data
- Employing labeled data
- Using clustering methods
- Using reinforcement strategies
Clustering methods are a common approach in Unsupervised Learning to group data based on similarities, suitable for recommendation systems without labeled data.
A company wants to predict customer churn based on historical data. What considerations must be made in selecting and tuning a Machine Learning model for this task?
- Considering the business context, available data, model interpretability, and performance metrics
- Focusing only on accuracy
- Ignoring feature engineering
- Selecting the most complex model available
Predicting customer churn requires understanding the business context, the nature of the data, and the need for model interpretability. Metrics such as precision, recall, and F1-score might be more relevant than mere accuracy.
What is the broad field of study that encompasses Machine Learning, Deep Learning, and other computational techniques to enable intelligent decision-making?
- Artificial Intelligence
- Computational Science
- Data Mining
- Deep Learning
Artificial Intelligence (AI) is the broad field that includes Machine Learning, Deep Learning, and other techniques aimed at creating intelligent systems.
The term _________ refers to a situation where a regression model fits the training data too closely, resulting in poor performance on new data.
- Bias
- Overfitting
- Regularization
- Underfitting
Overfitting refers to a situation where a regression model fits the training data too closely, capturing noise and resulting in poor performance on unseen data.
Is DBSCAN sensitive to the choice of Epsilon and MinPts? Why or why not?
- No, they are auto-calculated parameters
- No, they have minimal effect on the outcome
- Yes, they define the shape of the clusters
- Yes, they influence the density of clusters
DBSCAN is indeed sensitive to the choice of Epsilon and MinPts. These parameters are crucial in determining the density of the clusters, as Epsilon controls the maximum radius of the neighborhood, and MinPts sets the minimum number of points required to form a dense region. Selecting inappropriate values can lead to suboptimal clustering results.
How would you handle a situation in which the SVM is performing poorly due to the choice of kernel?
- Change the dataset
- Change to a more appropriate kernel using cross-validation
- Ignore the issue
- Use only linear kernel
Changing to an appropriate kernel using cross-validation can enhance the performance if the current kernel is not suitable for the data.
In comparison to PCA, LDA focuses on maximizing the separability between different ___________ rather than the variance of the data.
- classes
- features
- principal components
- variables
Unlike PCA, which focuses on the variance of the data, LDA emphasizes maximizing the separability between "different classes."
Describe a situation where a high Accuracy might be misleading, and a different metric (e.g., Precision, Recall, or F1-Score) might be more appropriate.
- When the dataset has equal classes, Precision is more appropriate
- When the dataset has only one class, Recall is more appropriate
- When the dataset is imbalanced, other metrics like Precision or Recall may be more informative
- nan
In imbalanced datasets, where one class significantly outnumbers the other, Accuracy can be misleading. Even a naive model predicting the majority class will have high Accuracy. Metrics like Precision, Recall, or F1-Score provide more insight into the model's performance on the minority class.
You are having difficulty interpreting the coefficients of your Logistic Regression model. How might the Logit function and Odds Ratio help in understanding them?
- By transforming coefficients into R-squared values
- By transforming coefficients into log-odds and allowing interpretation in terms of odds
- By transforming coefficients into odds
- By transforming coefficients into probabilities
The Logit function and Odds Ratio can help in understanding the coefficients by transforming them into log-odds and allowing interpretation in terms of the change in odds for a one-unit change in the predictor.