For the k-NN algorithm, what could be a potential drawback of using a very large value of kk?
- Increased Model Bias
- Increased Model Variance
- Overfitting to Noise
- Slower Training Time
A potential drawback of using a large value of 'k' in k-NN is that it can overfit to noise in the data, leading to reduced accuracy on the test data.
You have a dataset with numerous features, and you suspect that many of them are correlated. Using which technique can you both reduce the dimensionality and tackle multicollinearity?
- Data Imputation
- Decision Trees
- Feature Scaling
- Principal Component Analysis (PCA)
Principal Component Analysis (PCA) can reduce dimensionality by transforming correlated features into a smaller set of uncorrelated variables. It addresses multicollinearity by creating new axes (principal components) where the original variables are no longer correlated, thus improving the model's stability and interpretability.
Why might it be problematic if a loan approval machine learning model is not transparent and explainable in its decision-making process?
- Increased risk of discrimination
- Enhanced privacy protection
- Improved loan approval process
- Faster decision-making
If a loan approval model is not transparent and explainable, it may lead to increased risks of discrimination, as it becomes unclear why certain applicants were approved or denied loans, potentially violating anti-discrimination laws.
Which regularization technique adds a penalty equivalent to the absolute value of the magnitude of coefficients?
- Elastic Net
- L1 Regularization
- L2 Regularization
- Ridge Regularization
L1 Regularization, also known as Lasso, adds a penalty equivalent to the absolute value of coefficients. This helps in feature selection by encouraging some coefficients to become exactly zero.
An autoencoder's primary objective is to minimize the difference between the input and the ________.
- Output
- Reconstruction
- Encoding
- Activation
The primary objective of an autoencoder is to minimize the difference between the input and its 'Reconstruction,' which is the encoded-decoded output.
A ________ is a tool in machine learning that helps...
- Feature Extractor
- Principal Component Analysis (PCA)
- Gradient Descent
- Overfitting
Principal Component Analysis (PCA) is a technique used for dimensionality reduction. It identifies and retains important information while reducing the number of input variables in a dataset.
In binary classification, if a model correctly predicts all positive instances and no negative instances as positive, its ________ will be 1.
- Accuracy
- F1 Score
- Precision
- Recall
When a model correctly predicts all positive instances and no negative instances as positive, it means it has perfect "precision." Precision measures how many of the predicted positive instances were correct.
One of the drawbacks of using t-SNE is that it's not deterministic, meaning multiple runs with the same data can yield ________ results.
- Different
- Identical
- Similar
- Unpredictable
t-SNE (t-Distributed Stochastic Neighbor Embedding) is a probabilistic dimensionality reduction technique. Its non-deterministic nature means that each run may result in a different embedding, making the results unpredictable.
Which of the following is a concern when machine learning models make decisions without human understanding: Accuracy, Scalability, Interpretability, or Efficiency?
- Interpretability
- Accuracy
- Scalability
- Efficiency
The concern when machine learning models make decisions without human understanding is primarily related to "Interpretability." A lack of interpretability can lead to mistrust and challenges in understanding why a model made a particular decision.
Which classifier is based on applying Bayes' theorem with the assumption of independence between every pair of features?
- K-Means
- Naive Bayes
- Random Forest
- Support Vector Machine
Naive Bayes is a classifier based on Bayes' theorem with the assumption of feature independence, making it effective for text classification.
Which type of regression is used to predict the probability of a categorical outcome?
- Decision Tree Regression
- Linear Regression
- Logistic Regression
- Polynomial Regression
Logistic Regression is specifically designed for predicting the probability of a categorical outcome. It's used when the dependent variable is binary (e.g., spam or not spam).
How can biases in training data affect the fairness of a machine learning model?
- Bias in training data can lead to underrepresented groups not being considered
- Bias can lead to faster training
- Bias has no impact on model fairness
- Bias can improve model fairness
Biases in training data can lead to underrepresentation of certain groups, causing the model to make unfair predictions, especially for those underrepresented groups.