For the k-NN algorithm, what could be a potential drawback of using a very large value of kk?

Increased Model Bias
Increased Model Variance
Overfitting to Noise
Slower Training Time

A potential drawback of using a large value of 'k' in k-NN is that it can overfit to noise in the data, leading to reduced accuracy on the test data.

Discuss it

You have a dataset with numerous features, and you suspect that many of them are correlated. Using which technique can you both reduce the dimensionality and tackle multicollinearity?

Data Imputation
Decision Trees
Feature Scaling
Principal Component Analysis (PCA)

Principal Component Analysis (PCA) can reduce dimensionality by transforming correlated features into a smaller set of uncorrelated variables. It addresses multicollinearity by creating new axes (principal components) where the original variables are no longer correlated, thus improving the model's stability and interpretability.

Discuss it

Why might it be problematic if a loan approval machine learning model is not transparent and explainable in its decision-making process?

Increased risk of discrimination
Enhanced privacy protection
Improved loan approval process
Faster decision-making

If a loan approval model is not transparent and explainable, it may lead to increased risks of discrimination, as it becomes unclear why certain applicants were approved or denied loans, potentially violating anti-discrimination laws.

Discuss it

Which regularization technique adds a penalty equivalent to the absolute value of the magnitude of coefficients?

Elastic Net
L1 Regularization
L2 Regularization
Ridge Regularization

L1 Regularization, also known as Lasso, adds a penalty equivalent to the absolute value of coefficients. This helps in feature selection by encouraging some coefficients to become exactly zero.

Discuss it

An autoencoder's primary objective is to minimize the difference between the input and the ________.

Output
Reconstruction
Encoding
Activation

The primary objective of an autoencoder is to minimize the difference between the input and its 'Reconstruction,' which is the encoded-decoded output.

Discuss it

A ________ is a tool in machine learning that helps...

Feature Extractor
Principal Component Analysis (PCA)
Gradient Descent
Overfitting

Principal Component Analysis (PCA) is a technique used for dimensionality reduction. It identifies and retains important information while reducing the number of input variables in a dataset.

Discuss it

In binary classification, if a model correctly predicts all positive instances and no negative instances as positive, its ________ will be 1.

Accuracy
F1 Score
Precision
Recall

When a model correctly predicts all positive instances and no negative instances as positive, it means it has perfect "precision." Precision measures how many of the predicted positive instances were correct.

Discuss it

One of the drawbacks of using t-SNE is that it's not deterministic, meaning multiple runs with the same data can yield ________ results.

Different
Identical
Similar
Unpredictable

t-SNE (t-Distributed Stochastic Neighbor Embedding) is a probabilistic dimensionality reduction technique. Its non-deterministic nature means that each run may result in a different embedding, making the results unpredictable.

Discuss it

Which of the following is a concern when machine learning models make decisions without human understanding: Accuracy, Scalability, Interpretability, or Efficiency?

Interpretability
Accuracy
Scalability
Efficiency

The concern when machine learning models make decisions without human understanding is primarily related to "Interpretability." A lack of interpretability can lead to mistrust and challenges in understanding why a model made a particular decision.

Discuss it

Which classifier is based on applying Bayes' theorem with the assumption of independence between every pair of features?

K-Means
Naive Bayes
Random Forest
Support Vector Machine

Naive Bayes is a classifier based on Bayes' theorem with the assumption of feature independence, making it effective for text classification.

Discuss it

Which type of regression is used to predict the probability of a categorical outcome?

Decision Tree Regression
Linear Regression
Logistic Regression
Polynomial Regression

Logistic Regression is specifically designed for predicting the probability of a categorical outcome. It's used when the dependent variable is binary (e.g., spam or not spam).

Discuss it

How can biases in training data affect the fairness of a machine learning model?

Bias in training data can lead to underrepresented groups not being considered
Bias can lead to faster training
Bias has no impact on model fairness
Bias can improve model fairness

Biases in training data can lead to underrepresentation of certain groups, causing the model to make unfair predictions, especially for those underrepresented groups.

Discuss it