What is the role of cross-validation in detecting and preventing overfitting in Polynomial Regression?

It assists in increasing model complexity
It focuses on training data only
It helps in choosing the right degree and assessing generalization
It helps in selecting features

Cross-validation plays a key role in detecting and preventing overfitting in Polynomial Regression by helping in choosing the right degree for the polynomial and assessing how well the model generalizes to new data.

Discuss it

In K-Means clustering, a common approach to avoid local minima due to initial centroid selection is to run the algorithm multiple times with different _________.

Centroid initializations
Distance metrics
Learning rates
Number of clusters

Running the K-Means algorithm multiple times with different centroid initializations helps in avoiding local minima. It increases the chance of finding a more globally optimal clustering solution.

Discuss it

You applied DBSCAN and found that many points are being classified as noise. What adjustments could you make to the parameters?

Decrease Epsilon; Increase MinPts
Increase Epsilon; Decrease MinPts
Increase both Epsilon and MinPts
Use the same Epsilon and MinPts but change the clustering method

Increasing Epsilon and decreasing MinPts will make the clustering less strict, reducing the chance of points being classified as noise. Epsilon defines the neighborhood size, and MinPts defines the minimum points required to form a cluster. By adjusting them, more points can be included in clusters, reducing noise classification.

Discuss it

How can you evaluate the performance of an LDA model?

By checking the size of the scatter matrices
By comparing with PCA
Using confusion matrix and ROC curves
Using only accuracy

The performance of an LDA model can be evaluated using metrics like the "confusion matrix and ROC curves." These tools provide insights into the model's ability to classify instances correctly and its trade-off between sensitivity and specificity.

Discuss it

Bootstrapping involves resampling with replacement from the dataset to create "n" _________ datasets.

additional
bootstrap
copied
resampled

Bootstrapping is a statistical method that involves resampling with replacement from the dataset to create "n" "bootstrap" datasets. It allows estimating the distribution of a statistic by creating many resampled datasets and calculating the statistic for each.

Discuss it

What are the main differences between PCA and Linear Discriminant Analysis (LDA) as techniques for dimensionality reduction?

Both techniques work the same way
PCA is a type of LDA
PCA is unsupervised, LDA is supervised
PCA maximizes within-class variance, LDA between

The main difference between PCA and LDA is that PCA is an unsupervised technique that maximizes the total variance in the data, while LDA is a supervised technique that maximizes the between-class variance and minimizes the within-class variance. This makes LDA more suitable when class labels are available, while PCA can be used without them.

Discuss it

What is multicollinearity in the context of Multiple Linear Regression?

Adding interaction effects
High correlation among variables
Lowering the bias of the model
Reducing overfitting

Multicollinearity refers to a situation where two or more independent variables in a Multiple Linear Regression model are highly correlated with each other.

Discuss it

In DBSCAN, Epsilon is the maximum radius of the neighborhood from a data point, and MinPts is the minimum number of points required to form a ________.

border point
cluster
core point
noise point

In DBSCAN, Epsilon defines the neighborhood radius, and MinPts defines the minimum number of points required to form a cluster. If a point has at least MinPts within its Epsilon neighborhood, a cluster is formed.

Discuss it

In a situation with mixed types of features, a __________ distance metric might be preferable in KNN.

Cosine
Euclidean
Gower
Manhattan

The Gower distance metric can handle mixed types of features (numerical, categorical) and is often preferable in such cases.

Discuss it

Your Decision Tree is suffering from high bias. How could adjusting the parameters related to entropy or the Gini Index help in this scenario?

Add more training data
Increase tree complexity by fine-tuning split criteria
Reduce tree complexity by fine-tuning split criteria
Remove features

High bias often means the model is too simple. Adjusting the parameters related to entropy or the Gini Index to create more complex splits can help capture underlying patterns in the data, thereby reducing bias and potentially improving predictive accuracy.

Discuss it

In what scenarios might DBSCAN be a less appropriate clustering algorithm compared to others?

When clusters have different densities
When clusters have similar densities
When data distribution is highly skewed
When data is uniformly distributed

DBSCAN might be less suitable when clusters have different densities, as the same Epsilon and MinPts parameters apply to all clusters. This can lead to difficulty in capturing clusters with widely varying densities, making other clustering methods that can adapt to varying density clusters potentially more appropriate in such scenarios.

Discuss it

How are financial institutions using Machine Learning to detect fraudulent activities?

Fraud Detection
Personalized Education
Recommending Media
Weather Prediction

Financial institutions use Machine Learning algorithms to detect fraudulent activities by analyzing transaction patterns and identifying anomalies or suspicious behavior.

Discuss it