__________ learning utilizes both labeled and unlabeled data, often leveraging the strengths of both supervised and unsupervised learning.
- reinforcement
- semi-supervised
- supervised
- unsupervised
Semi-Supervised learning combines both labeled and unlabeled data, leveraging the strengths of both supervised and unsupervised learning.
How does K-Means clustering respond to non-spherical data distributions, and how can initialization affect this?
- Adapts well to non-spherical data
- Performs equally well with all data shapes
- Struggles with non-spherical data; Initialization can alleviate this
- Struggles with non-spherical data; Initialization has no effect
K-Means tends to struggle with non-spherical data distributions since it relies on Euclidean distance. Careful initialization can partially alleviate this issue but cannot fully overcome the fundamental limitation.
Why is entropy used in Decision Trees?
- Increase Efficiency
- Increase Size
- Measure Purity
- Predict Outcome
Entropy is used to measure the purity of a split, helping to determine the best attribute for splitting at each node.
What is the principle behind the Random Forest algorithm?
- Ensemble of trees, increased complexity
- Ensemble of trees, reduced variance
- Single decision tree, increased bias
- Single decision tree, reduced bias
Random Forest is an ensemble learning method that operates by constructing multiple decision trees during training and outputs the mode of the classes for classification or the mean prediction of individual trees for regression. By combining many trees, it generally reduces overfitting and provides a more accurate prediction.
How does classification differ from regression in supervised learning?
- Classification and regression are the same
- Classification predicts categories; regression predicts continuous values
- Classification predicts continuous values; regression predicts categories
- Classification uses labeled data; regression uses unlabeled data
Classification predicts discrete categories, while regression predicts continuous values. Both are techniques used in supervised learning, but they handle different types of prediction tasks.
What is clustering in the context of Machine Learning?
- A classification algorithm
- A regression method
- A supervised learning technique
- An unsupervised learning technique for grouping similar data
Clustering is an unsupervised learning technique used to group similar data points together without any labeled responses.
Your model is showing signs of overfitting. How could bagging or boosting be utilized to address this problem?
- Bagging to average predictions of overfitted models
- Bagging with increased complexity
- Boosting with reduced complexity
- Both bagging and boosting can't address overfitting
Bagging can help address overfitting by averaging predictions from overfitted models trained on different subsets of data. This helps to cancel out the noise and reduce the overall variance of the ensemble.
In what scenarios would you prefer Polynomial Regression over Simple Linear Regression?
- When the data is categorical
- When the relationship is linear
- When the relationship is logarithmic
- When the relationship is quadratic or higher-order
Polynomial Regression is preferred over Simple Linear Regression when the relationship between the dependent and independent variables is not linear but can be modeled as a polynomial (quadratic, cubic, etc.). Polynomial regression can capture more complex patterns in the data, making it suitable for non-linear relationships.
How can overfitting and underfitting be detected through training and testing data?
- Overfitting detected by high training error; Underfitting by low testing error
- Overfitting detected by low complexity; Underfitting by high complexity
- Overfitting detected by low training error and high testing error; Underfitting by high training and testing errors
- Underfitting detected by low training error; Overfitting by low testing error
Overfitting is detected when there is low training error but high testing error, as the model fits the training data too well but fails to generalize. Underfitting is detected when both training and testing errors are high, indicating that the model fails to capture underlying trends.
A weather forecasting agency is looking to improve the accuracy of its predictions. What Machine Learning methods would be relevant here?
- Clustering, Text Classification
- Image Recognition, Drug Development
- Recommender Systems, Financial Data
- Weather Data, Time-Series Forecasting
Weather Data and Time-Series Forecasting methods, like ARIMA or deep learning models, can be used to analyze and predict weather patterns, leveraging historical weather data and atmospheric conditions to improve accuracy.