How does boosting reduce bias in a machine learning model?
- By averaging the predictions of many models
- By focusing on one strong model
- By training only on the easiest examples
- By training sequentially on misclassified examples
Boosting reduces bias by training models sequentially, with each model focusing on the examples that were misclassified by the previous ones. This iterative correction process reduces bias and enhances the overall performance of the model.
Why is entropy used in Decision Trees?
- Increase Efficiency
- Increase Size
- Measure Purity
- Predict Outcome
Entropy is used to measure the purity of a split, helping to determine the best attribute for splitting at each node.
What techniques can be used to detect overfitting in Polynomial Regression?
- Adding more features
- Cross-validation and visual inspection
- Ignoring the validation set
- Increasing the degree
Techniques like cross-validation and visual inspection of the fit can be used to detect overfitting in Polynomial Regression. They help in assessing how well the model generalizes to new data, revealing any overfitting issues.
How is Machine Learning applied in the healthcare industry for improving patient care?
- Autonomous Driving
- Fraud Detection
- Patient Diagnosis and Treatment
- Personalized Education
Machine Learning in healthcare is applied to patient diagnosis and treatment by analyzing medical data, detecting patterns, predicting diseases, and personalizing treatment plans.
The visualization tool used to represent the arrangement of the clusters produced by hierarchical clustering is called a _________.
- Cluster Map
- Dendrogram
- Heatmap
- Scatter Plot
A dendrogram is a tree-like diagram that shows the arrangement of the clusters produced by hierarchical clustering. It provides a visual representation of the clustering process, displaying how individual data points are grouped into clusters. It is a valuable tool for understanding the hierarchy and deciding where to cut the tree to form clusters.
What is Polynomial Regression in the context of Machine Learning?
- A classification method
- A non-linear regression that fits polynomial equations
- A type of linear regression
- An algorithm for clustering
Polynomial Regression is a type of regression analysis that fits a polynomial equation to the data. It allows for more complex relationships between the dependent and independent variables compared to linear regression.
In PCA, what do Eigenvalues represent?
- Amount of variance explained by each component
- Direction of the new coordinate system
- Noise in the data
- Scale of the data
Eigenvalues in PCA represent the amount of variance that is accounted for by each corresponding eigenvector. The larger the eigenvalue, the more variance the principal component explains.
Can dimensionality reduction lead to information loss? If so, how can this risk be minimized?
- No, it always preserves information
- Yes, by careful feature selection
- Yes, by optimizing the transformation method
- Yes, by using only unsupervised methods
Dimensionality reduction can lead to information loss, as reducing the number of features may omit details from the original data. This risk can be minimized by carefully optimizing the transformation method, selecting the right number of components, and considering the specific needs and nature of the data. Unsupervised or supervised methods may be more appropriate depending on the context.
If the residuals in Simple Linear Regression are not evenly spread around the regression line, it is a sign of _________.
- Heteroscedasticity
- Homoscedasticity
- Linearity
- Overfitting
If the residuals are not evenly spread, it is a sign of heteroscedasticity, where the variability of the residuals is unequal across levels of the explanatory variable.
What is the significance of minimizing within-class variance in LDA?
- It decreases model accuracy
- It enhances separation between different classes
- It maximizes the similarity between classes
- It reduces computational complexity
Minimizing "within-class variance" in LDA ensures that data points within the same class are close together. This enhances the separation between different classes, leading to improved discrimination and classification performance.