What is clustering in the context of Machine Learning?

A classification algorithm
A regression method
A supervised learning technique
An unsupervised learning technique for grouping similar data

Clustering is an unsupervised learning technique used to group similar data points together without any labeled responses.

Discuss it

How does classification differ from regression in supervised learning?

Classification and regression are the same
Classification predicts categories; regression predicts continuous values
Classification predicts continuous values; regression predicts categories
Classification uses labeled data; regression uses unlabeled data

Classification predicts discrete categories, while regression predicts continuous values. Both are techniques used in supervised learning, but they handle different types of prediction tasks.

Discuss it

What is the principle behind the Random Forest algorithm?

Ensemble of trees, increased complexity
Ensemble of trees, reduced variance
Single decision tree, increased bias
Single decision tree, reduced bias

Random Forest is an ensemble learning method that operates by constructing multiple decision trees during training and outputs the mode of the classes for classification or the mean prediction of individual trees for regression. By combining many trees, it generally reduces overfitting and provides a more accurate prediction.

Discuss it

Why is entropy used in Decision Trees?

Increase Efficiency
Increase Size
Measure Purity
Predict Outcome

Entropy is used to measure the purity of a split, helping to determine the best attribute for splitting at each node.

Discuss it

How is Machine Learning applied in the healthcare industry for improving patient care?

Autonomous Driving
Fraud Detection
Patient Diagnosis and Treatment
Personalized Education

Machine Learning in healthcare is applied to patient diagnosis and treatment by analyzing medical data, detecting patterns, predicting diseases, and personalizing treatment plans.

Discuss it

What techniques can be used to detect overfitting in Polynomial Regression?

Adding more features
Cross-validation and visual inspection
Ignoring the validation set
Increasing the degree

Techniques like cross-validation and visual inspection of the fit can be used to detect overfitting in Polynomial Regression. They help in assessing how well the model generalizes to new data, revealing any overfitting issues.

Discuss it

You are given a dataset that is not linearly separable. How would you use SVM with the Kernel Trick to classify the data?

Apply a linear kernel only
Apply a non-linear kernel to transform the feature space
Increase data size
Reduce data size

The Kernel Trick with a non-linear kernel (such as RBF) can transform the feature space, making it linearly separable, and thus classify non-linear data.

Discuss it

What is the significance of minimizing within-class variance in LDA?

It decreases model accuracy
It enhances separation between different classes
It maximizes the similarity between classes
It reduces computational complexity

Minimizing "within-class variance" in LDA ensures that data points within the same class are close together. This enhances the separation between different classes, leading to improved discrimination and classification performance.

Discuss it

If the residuals in Simple Linear Regression are not evenly spread around the regression line, it is a sign of _________.

Heteroscedasticity
Homoscedasticity
Linearity
Overfitting

If the residuals are not evenly spread, it is a sign of heteroscedasticity, where the variability of the residuals is unequal across levels of the explanatory variable.

Discuss it

Can dimensionality reduction lead to information loss? If so, how can this risk be minimized?

No, it always preserves information
Yes, by careful feature selection
Yes, by optimizing the transformation method
Yes, by using only unsupervised methods

Dimensionality reduction can lead to information loss, as reducing the number of features may omit details from the original data. This risk can be minimized by carefully optimizing the transformation method, selecting the right number of components, and considering the specific needs and nature of the data. Unsupervised or supervised methods may be more appropriate depending on the context.

Discuss it

In PCA, what do Eigenvalues represent?

Amount of variance explained by each component
Direction of the new coordinate system
Noise in the data
Scale of the data

Eigenvalues in PCA represent the amount of variance that is accounted for by each corresponding eigenvector. The larger the eigenvalue, the more variance the principal component explains.

Discuss it

What is Polynomial Regression in the context of Machine Learning?

A classification method
A non-linear regression that fits polynomial equations
A type of linear regression
An algorithm for clustering

Polynomial Regression is a type of regression analysis that fits a polynomial equation to the data. It allows for more complex relationships between the dependent and independent variables compared to linear regression.

Discuss it