Which ensemble method combines multiple decision trees and aggregates their results for improved accuracy and reduced overfitting?
- Logistic Regression
- Naive Bayes
- Principal Component Analysis (PCA)
- Random Forest
Random Forest is an ensemble method that combines multiple decision trees. It aggregates their results through techniques like bagging and boosting to achieve better accuracy and reduce overfitting. Random Forest is a popular choice for various machine learning tasks.
Imagine you're developing a model to recognize rare bird species from images. You don't have many labeled examples of these rare birds, but you have a model trained on thousands of common bird species. How might you leverage this existing model for your task?
- Fine-tuning the Pre-trained Model
- Random Initialization of Weights
- Training the Model from Scratch
- Using the Model Only for Common Bird Recognition
Fine-tuning involves taking a pre-trained model and adjusting its parameters, typically only in the final layers, to specialize it for your specific task, which is recognizing rare bird species in this case.
Which clustering method assigns data points to the nearest cluster center and recalculates the center until convergence?
- Agglomerative
- DBSCAN
- Hierarchical
- K-Means
K-Means clustering is an iterative algorithm that assigns each data point to the nearest cluster center, recalculating these centers until they converge.
t-SNE is particularly known for preserving which kind of structures from the high-dimensional data in the low-dimensional representation?
- Global Structures
- Local Structures
- Numerical Structures
- Geometric Structures
t-SNE is known for preserving local structures in the low-dimensional representation, making it effective for visualization and capturing fine-grained relationships.
When both precision and recall are important for a problem, one might consider optimizing the ________ score.
- Accuracy
- F1 Score
- ROC AUC
- Specificity
The F1 Score is a measure that balances both precision and recall. It is especially useful when you want to consider both false positives and false negatives in your classification problem.
When using K-means clustering, why is it sometimes recommended to run the algorithm multiple times with different initializations?
- To ensure deterministic results.
- To make the algorithm run faster.
- To mitigate sensitivity to initial cluster centers.
- To reduce the number of clusters.
K-means clustering is sensitive to initial cluster centers. Running it multiple times with different initializations helps find a more stable solution.
________ is a technique where during training, random subsets of neurons are ignored, helping to make the model more robust.
- Dropout
- Regularization
- Batch Normalization
- Activation Function
Dropout is a regularization technique that involves randomly deactivating a fraction of neurons during training. This helps prevent overfitting, making the model more robust and less dependent on specific neurons.
In the context of decision trees, what is "information gain" used for?
- To assess the tree's overall accuracy
- To calculate the depth of the tree
- To determine the number of leaf nodes
- To measure the purity of a split
Information gain is used to measure the purity of a split in a decision tree. It helps decide which feature to split on by evaluating how much it reduces uncertainty or entropy.
What is a common application of GANs in the field of image processing?
- Image classification.
- Style transfer.
- Sentiment analysis.
- Speech recognition.
GANs are frequently used for style transfer, a technique that changes the artistic style of an image. It's commonly employed in fields like art and design for image manipulation and transformation.
If a classifier predicts the positive class perfectly but struggles with the negative class, the ________ might still be high due to the imbalance.
- AUC-ROC
- Accuracy
- F1 Score
- True Positive Rate
If a classifier predicts the positive class perfectly but struggles with the negative class, the Accuracy might still be high due to class imbalance. Accuracy can be misleading in imbalanced datasets because it doesn't account for the unequal distribution of classes. F1 Score and AUC-ROC are more robust metrics in such cases.