How can biases in training data affect the fairness of a machine learning model?

Bias in training data can lead to underrepresented groups not being considered
Bias can lead to faster training
Bias has no impact on model fairness
Bias can improve model fairness

Biases in training data can lead to underrepresentation of certain groups, causing the model to make unfair predictions, especially for those underrepresented groups.

Discuss it

Which type of regression is used to predict the probability of a categorical outcome?

Decision Tree Regression
Linear Regression
Logistic Regression
Polynomial Regression

Logistic Regression is specifically designed for predicting the probability of a categorical outcome. It's used when the dependent variable is binary (e.g., spam or not spam).

Discuss it

A financial institution wants to predict whether a loan applicant is likely to default on their loan. They have a mix of numerical data (like income, age) and categorical data (like occupation, marital status). Which algorithm might be well-suited for this task due to its ability to handle both types of data?

Decision Tree
Random Forest
Support Vector Machine
k-Nearest Neighbors

The Random Forest algorithm is well-suited for this task because it can handle both numerical and categorical data effectively. It combines multiple decision trees and takes a vote to make predictions, making it robust and accurate for such mixed data.

Discuss it

Which of the following RNN variants uses both a forget gate and an input gate to regulate the flow of information?

LSTM (Long Short-Term Memory)
GRU (Gated Recurrent Unit)
Elman Network
Jordan Network

The LSTM (Long Short-Term Memory) variant uses both a forget gate and an input gate to manage information flow. These gates allow it to control which information to forget or remember, making it highly effective in learning and retaining information over long sequences.

Discuss it

t-SNE is a technique primarily used for what kind of task in machine learning?

Dimensionality Reduction
Image Classification
Anomaly Detection
Reinforcement Learning

t-SNE (t-distributed Stochastic Neighbor Embedding) is primarily used for dimensionality reduction, reducing high-dimensional data to a lower-dimensional representation for visualization and analysis.

Discuss it

Which algorithm is commonly used for blind source separation or separating mixed signals?

Principal Component Analysis (PCA)
Support Vector Machine (SVM)
K-Means Clustering
Decision Trees

Principal Component Analysis (PCA) is commonly used for blind source separation, reducing the dimensionality of data to separate mixed signals. PCA identifies the principal components or directions of maximum variance in the data.

Discuss it

SVMs aim to maximize the margin, which is the distance between the decision boundary and the nearest ______ from any class.

Decision Tree
Hyperplane
Outlier
Support Vector

SVMs aim to maximize the margin, which is the distance between the decision boundary and the nearest support vector from any class. Support vectors play a crucial role in defining the decision boundary.

Discuss it

The equation y=mx+cy=mx+c is a simple representation of ________ regression.

Linear
Logistic
Polynomial
Ridge

The equation y=mx+c represents a simple linear regression. In this equation, 'y' is the dependent variable, 'x' is the independent variable, 'm' is the slope, and 'c' is the intercept. It's used to model a linear relationship between variables.

Discuss it

You are working on a fraud detection system where false negatives (failing to detect a fraud) can have severe financial implications. Which metric would you prioritize to ensure that as many actual fraud cases as possible are detected?

Accuracy
F1 Score
Precision
Recall

In this high-stakes scenario, prioritizing Recall is crucial. Recall measures the ability to detect actual fraud cases, minimizing false negatives, which is of paramount importance in a fraud detection system with severe financial consequences.

Discuss it

In GANs, what is the significance of the Nash Equilibrium?

It's a point where both the generator and discriminator are optimal.
It's a theoretical concept without practical relevance.
It's the point where only the generator is optimal.
It's the point where only the discriminator is optimal.

The Nash Equilibrium in GANs is when both the generator and discriminator reach an optimal state. It signifies stability in GAN training.

Discuss it