A medical research company is working on image data, where they want to classify microscopic images into cancerous and non-cancerous categories. The boundary between these categories is not linear. Which algorithm would be a strong candidate for this problem?
- Convolutional Neural Network (CNN)
- Logistic Regression
- Naive Bayes Classifier
- Principal Component Analysis
Convolutional Neural Networks (CNNs) are excellent for image classification tasks, especially when dealing with non-linear boundaries. They use convolutional layers to extract features from images, making them suitable for tasks like cancerous/non-cancerous image classification.
The term "exploitation" in reinforcement learning refers to which of the following?
- Utilizing the best-known actions
- Trying new, unexplored actions
- Maximizing exploration
- Modifying the environment
Exploitation involves utilizing the best-known actions to maximize rewards based on current knowledge, minimizing risk and uncertainty.
________ learning is often used for discovering hidden patterns in data.
- Reinforcement
- Semi-supervised
- Supervised
- Unsupervised
Unsupervised learning is a machine learning approach where algorithms are used to identify patterns in data without explicit guidance. It is commonly employed for data exploration and pattern discovery.
When dealing with high-dimensional data, which of the two algorithms (k-NN or Naive Bayes) is likely to be more efficient in terms of computational time?
- Both Equally Efficient
- Naive Bayes
- Neither is Efficient
- k-NN
Naive Bayes is typically more efficient in high-dimensional data due to its simple probabilistic calculations, while k-NN can suffer from the "curse of dimensionality."
In the k-NN algorithm, as the value of k increases, the decision boundary becomes __________.
- Linear
- More complex
- More simplified
- Non-existent
As the value of k in k-NN increases, the decision boundary becomes more simplified because it is based on fewer neighboring data points.
A company wants to segment its customers based on their purchasing behavior. They have a fair idea that there are around 5 distinct segments but want to confirm this. Which clustering algorithm might they start with?
- K-Means Clustering
- Agglomerative Hierarchical Clustering
- Mean-Shift Clustering
- Spectral Clustering
The company might start with K-Means Clustering to confirm their idea of five distinct segments. K-Means is often used for partitioning data into a pre-specified number of clusters and can be a good choice when you have a rough idea of the number of clusters.
Which clustering method assigns data points to the nearest cluster center and recalculates the center until convergence?
- Agglomerative
- DBSCAN
- Hierarchical
- K-Means
K-Means clustering is an iterative algorithm that assigns each data point to the nearest cluster center, recalculating these centers until they converge.
t-SNE is particularly known for preserving which kind of structures from the high-dimensional data in the low-dimensional representation?
- Global Structures
- Local Structures
- Numerical Structures
- Geometric Structures
t-SNE is known for preserving local structures in the low-dimensional representation, making it effective for visualization and capturing fine-grained relationships.
When both precision and recall are important for a problem, one might consider optimizing the ________ score.
- Accuracy
- F1 Score
- ROC AUC
- Specificity
The F1 Score is a measure that balances both precision and recall. It is especially useful when you want to consider both false positives and false negatives in your classification problem.
When using K-means clustering, why is it sometimes recommended to run the algorithm multiple times with different initializations?
- To ensure deterministic results.
- To make the algorithm run faster.
- To mitigate sensitivity to initial cluster centers.
- To reduce the number of clusters.
K-means clustering is sensitive to initial cluster centers. Running it multiple times with different initializations helps find a more stable solution.