What is the main difference between supervised and unsupervised learning?
- The algorithms used
- The complexity of the models
- The data size
- The use of labeled data
Supervised learning uses labeled data where the output is known, while unsupervised learning deals with unlabeled data and finds hidden patterns without guidance on the expected outcome.
While AI aims to mimic human intelligence, Machine Learning focuses on learning from data, and Deep Learning emphasizes learning from data using __________.
- clustering
- neural networks
- regression
- statistical methods
Deep Learning emphasizes learning from data using neural networks, particularly multi-layered structures known as deep neural networks.
Clustering can be used in _________ analysis to find patterns and similarities in large datasets, facilitating targeted marketing strategies.
- Customer Segmentation
- Decision Tree
- Linear Regression
- Principal Component
Clustering is used in customer segmentation analysis to group customers based on patterns and similarities, allowing for more targeted marketing strategies.
Which term refers to a subset of AI that deals with algorithms designed to identify patterns and make decisions with minimal human intervention?
- Data Mining
- Machine Learning
- Neural Networks
- Robotics
Machine Learning is a subset of AI that focuses on creating algorithms to identify patterns and make decisions with little or no human intervention.
Can dimensionality reduction be used to improve the performance of machine learning models? If so, how?
- All of the above
- By improving computational efficiency
- By reducing overfitting
- By simplifying the model
Dimensionality reduction can improve the performance of machine learning models by reducing overfitting (as the model becomes less complex), simplifying the model (making it easier to interpret), and improving computational efficiency (reducing training time and resource requirements).
The _________ linkage method in Hierarchical Clustering considers the average distance between all pairs of points in two clusters.
- Average Linkage
- Complete Linkage
- Single Linkage
- Ward's Method
Average Linkage considers the average distance between all pairs of points in two clusters. It falls between the Single and Complete Linkage methods, often providing a balance that avoids some of the extremes of either method. It can be a good choice when clusters are relatively compact but not necessarily spherical.
What are the strengths and weaknesses of using the Ward's method in Hierarchical Clustering?
- Maximizes mean distance but sensitive to initial configuration
- Maximizes variance but creates well-separated clusters
- Minimizes mean distance but less compact clusters
- Minimizes variance but sensitive to outliers
Ward's method in Hierarchical Clustering aims to minimize the variance within clusters, leading to tightly packed clusters. Strength: It often results in compact and balanced clusters. Weakness: It can be sensitive to outliers, as it minimizes the total within-cluster variance, which can be disproportionately influenced by extreme values.
What are some common methods of initializing centroids in K-Means clustering?
- Data Transformation
- Normalization
- Principal Component Analysis
- Random Selection, K-Means++
Common methods for initializing centroids in K-Means include Random Selection and K-Means++. These methods can affect the convergence speed and quality of the final clusters.
What role does model complexity play in overfitting?
- Has no effect on overfitting
- Increases the risk of overfitting
- Increases the risk of underfitting
- Reduces the risk of overfitting
Model complexity "increases the risk of overfitting." A more complex model can capture the noise in the training data, leading to poor generalization on unseen data.
__________ pruning is a technique where a decision tree is reduced by turning some branch nodes into leaf nodes.
- Cost Complexity
- Hybrid
- Random
- Reduced Error
Reduced Error Pruning is a technique where a decision tree is reduced by turning some branch nodes into leaf nodes and replacing them with the most common class. If this replacement does not reduce the accuracy on the validation set, the change is kept.