What is clustering in the context of Machine Learning?
- A classification algorithm
- A regression method
- A supervised learning technique
- An unsupervised learning technique for grouping similar data
Clustering is an unsupervised learning technique used to group similar data points together without any labeled responses.
How does classification differ from regression in supervised learning?
- Classification and regression are the same
- Classification predicts categories; regression predicts continuous values
- Classification predicts continuous values; regression predicts categories
- Classification uses labeled data; regression uses unlabeled data
Classification predicts discrete categories, while regression predicts continuous values. Both are techniques used in supervised learning, but they handle different types of prediction tasks.
What is the principle behind the Random Forest algorithm?
- Ensemble of trees, increased complexity
- Ensemble of trees, reduced variance
- Single decision tree, increased bias
- Single decision tree, reduced bias
Random Forest is an ensemble learning method that operates by constructing multiple decision trees during training and outputs the mode of the classes for classification or the mean prediction of individual trees for regression. By combining many trees, it generally reduces overfitting and provides a more accurate prediction.
Why is entropy used in Decision Trees?
- Increase Efficiency
- Increase Size
- Measure Purity
- Predict Outcome
Entropy is used to measure the purity of a split, helping to determine the best attribute for splitting at each node.
How is Machine Learning applied in the healthcare industry for improving patient care?
- Autonomous Driving
- Fraud Detection
- Patient Diagnosis and Treatment
- Personalized Education
Machine Learning in healthcare is applied to patient diagnosis and treatment by analyzing medical data, detecting patterns, predicting diseases, and personalizing treatment plans.
What techniques can be used to detect overfitting in Polynomial Regression?
- Adding more features
- Cross-validation and visual inspection
- Ignoring the validation set
- Increasing the degree
Techniques like cross-validation and visual inspection of the fit can be used to detect overfitting in Polynomial Regression. They help in assessing how well the model generalizes to new data, revealing any overfitting issues.
You are given a dataset that is not linearly separable. How would you use SVM with the Kernel Trick to classify the data?
- Apply a linear kernel only
- Apply a non-linear kernel to transform the feature space
- Increase data size
- Reduce data size
The Kernel Trick with a non-linear kernel (such as RBF) can transform the feature space, making it linearly separable, and thus classify non-linear data.
What is the significance of minimizing within-class variance in LDA?
- It decreases model accuracy
- It enhances separation between different classes
- It maximizes the similarity between classes
- It reduces computational complexity
Minimizing "within-class variance" in LDA ensures that data points within the same class are close together. This enhances the separation between different classes, leading to improved discrimination and classification performance.
If the residuals in Simple Linear Regression are not evenly spread around the regression line, it is a sign of _________.
- Heteroscedasticity
- Homoscedasticity
- Linearity
- Overfitting
If the residuals are not evenly spread, it is a sign of heteroscedasticity, where the variability of the residuals is unequal across levels of the explanatory variable.
Can dimensionality reduction lead to information loss? If so, how can this risk be minimized?
- No, it always preserves information
- Yes, by careful feature selection
- Yes, by optimizing the transformation method
- Yes, by using only unsupervised methods
Dimensionality reduction can lead to information loss, as reducing the number of features may omit details from the original data. This risk can be minimized by carefully optimizing the transformation method, selecting the right number of components, and considering the specific needs and nature of the data. Unsupervised or supervised methods may be more appropriate depending on the context.
In PCA, what do Eigenvalues represent?
- Amount of variance explained by each component
- Direction of the new coordinate system
- Noise in the data
- Scale of the data
Eigenvalues in PCA represent the amount of variance that is accounted for by each corresponding eigenvector. The larger the eigenvalue, the more variance the principal component explains.
What is Polynomial Regression in the context of Machine Learning?
- A classification method
- A non-linear regression that fits polynomial equations
- A type of linear regression
- An algorithm for clustering
Polynomial Regression is a type of regression analysis that fits a polynomial equation to the data. It allows for more complex relationships between the dependent and independent variables compared to linear regression.