The percentage of total variance explained by a principal component in PCA can be calculated by dividing the Eigenvalue of that component by the ________.

magnitude of Eigenvectors
number of Eigenvectors
number of components
sum of all Eigenvalues

The percentage of total variance explained by a principal component is calculated by dividing its Eigenvalue by the "sum of all Eigenvalues." This ratio gives the proportion of the dataset's total variance that is captured by that specific component.

Discuss it

How does the Elbow Method determine the optimal number of clusters, and what are its limitations?

By evaluating the model's accuracy
By finding the point of maximum curvature on a plot of variance vs. clusters
By maximizing the cluster distances
By minimizing the inter-cluster distances

The Elbow Method determines the optimal number of clusters by finding the "elbow" point on a plot of variance vs. clusters. Limitations include ambiguity in identifying the exact "elbow" and sensitivity to initialization.

Discuss it

How does choosing the value of K in the K-Nearest Neighbors (KNN) algorithm impact the decision boundary?

Both 1 & 2 depending on value
Makes it more complex
Makes it smoother
nan

A smaller K value results in a more complex decision boundary, while a larger K value makes it smoother.

Discuss it

You are given a complex dataset with a large amount of unstructured data. Which among AI, Machine Learning, or Deep Learning would be best suited to analyze this, and why?

AI, for its simplicity
Deep Learning, for its ability to handle complex and unstructured data
Machine Learning, for its structured data analysis
nan

Deep Learning models are adept at handling unstructured data and finding complex patterns, making them suitable for such a dataset.

Discuss it

In the context of a Confusion Matrix, _________ represents the cases where the model correctly predicted the negative class.

False Negatives
False Positives
True Negatives
True Positives

True Negatives (TN) in a Confusion Matrix represent cases where the model correctly predicted the negative class. It indicates that the negative instances were classified correctly.

Discuss it

Why is the choice of distance metric significant in the K-Nearest Neighbors (KNN) algorithm?

It affects clustering efficiency
It defines the complexity of the model
It determines the similarity measure
It influences feature selection

The choice of distance metric in KNN significantly impacts how similarity between instances is measured, affecting the neighbors chosen.

Discuss it

How does the Kernel Trick help in SVM?

Enhances data visualization
Reduces data size
Speeds up computation
Transforms data into higher dimension

The Kernel Trick in SVM transforms the data into a higher-dimensional space to make it linearly separable.

Discuss it

Hierarchical Clustering can be either agglomerative, where clusters are built from the bottom up, or divisive, where clusters are split from the top down. The most common method used is _________.

Agglomerative
Complete Linkage
Divisive
Single Linkage

Agglomerative method is the most commonly used approach in Hierarchical Clustering. It builds clusters from the bottom up, starting with individual data points and merging them into progressively larger clusters. This method allows for the creation of a dendrogram, which can be analyzed to choose the optimal number of clusters and understand the hierarchical relationships within the data.

Discuss it

The _________ hyperplane in SVM maximizes the margin between the support vectors of different classes.

Decision
Fixed
Optimal
Random

The optimal hyperplane in SVM is the one that maximizes the margin between support vectors of different classes.

Discuss it

What is an interaction effect in Multiple Linear Regression?

A combined effect of two variables
Linear relationship between variables
Model optimization
Removing irrelevant features

An interaction effect occurs when the effect of one variable on the dependent variable depends on the level of another variable. It shows the combined effect.

Discuss it

In which type of Machine Learning does the model learn from labeled data?

Reinforcement Learning
Semi-Supervised Learning
Supervised Learning
Unsupervised Learning

Supervised Learning involves training a model on a labeled dataset, meaning the desired output for each input is known. The model learns to predict the output from the input data.

Discuss it

Your Logistic Regression model is suffering from separation, causing some estimated Odds Ratios to be extremely large. How could you handle this issue?

By adding more variables
By applying regularization techniques
By increasing the size of the dataset
By removing all predictors

Separation in Logistic Regression can lead to overly large coefficient estimates. Applying regularization techniques, such as Ridge or Lasso, can help in constraining the coefficient estimates and mitigate this issue.

Discuss it