Can LDA be used for both classification and dimensionality reduction?
- No
- Only for classification
- Only for dimensionality reduction
- Yes
"Yes," LDA can be used both for classification, by finding the best linear combinations of features to separate classes, and for dimensionality reduction, by projecting data into a lower-dimensional space while preserving class separability.
How is the number of clusters in K-Means typically determined?
- Based on the dataset size
- Random selection
- Through classification
- Using the Elbow Method
The number of clusters in K-Means is typically determined using the Elbow Method, where the variance is plotted against the number of clusters to find the optimal point.
When using PCA, the data must be ___________ before applying the algorithm to ensure that each feature contributes equally.
- clustered
- normalized
- transformed
- visualized
Before applying PCA, the data must be normalized to ensure that each feature contributes equally to the principal components. Normalizing the data means that each feature will have a mean of 0 and a standard deviation of 1, thus ensuring that no feature dominates the others.
How do features in Machine Learning differ from targets, and why are both necessary?
- Features and targets are the same
- Features are input; Targets are predictions
- Features are predictions; Targets are input
- None of these definitions are correct
Features are the input variables used to make predictions, while targets are the values the model is trying to predict. Both are necessary for supervised learning, where features are used to predict the corresponding targets.
How would you handle a scenario where the feature values in a classification problem are on different scales?
- Apply feature scaling techniques like normalization or standardization
- Convert all features to binary values
- Ignore the scales
- Remove features with different scales
Applying feature scaling techniques like normalization or standardization ensures that all feature values are on the same scale. This is crucial for many classification algorithms, as it allows them to perform more effectively and converge faster.
___________ is a dimensionality reduction technique that maximizes the separability between different classes in a dataset.
- Factor Analysis
- Linear Discriminant Analysis (LDA)
- Principal Component Analysis (PCA)
- T-Distributed Stochastic Neighbor Embedding (t-SNE)
Linear Discriminant Analysis (LDA) is used to reduce dimensions while maximizing the separability between different classes, making it particularly useful for classification problems.
You implemented L1 regularization to prevent overfitting, but the model's performance did not improve. What could be the reason, and what alternative approach would you try?
- Model is overfitting, try L2 regularization
- Model is overfitting, try increasing regularization
- Model is underfitting, try L2 regularization
- Model is underfitting, try reducing regularization
If the model's performance did not improve with L1 regularization, it might be underfitting, meaning it's too constrained. An alternative approach would be to reduce regularization or try a different form like L2, which might be more suitable.
You're building a model that is suffering from high variance. Which ensemble method would be more appropriate to use, and why?
- Bagging
- Boosting
- Gradient Boosting
- nan
Bagging is an ensemble method that can reduce high variance by averaging predictions from multiple base learners trained on different subsets of the data. It helps to smooth out the individual variations and enhances the stability of the model.
What is the F1-Score, and why might you use it instead of Precision and Recall?
- Arithmetic mean of Precision and Recall
- Geometric mean of Precision and Recall
- Harmonic mean of Precision and Recall
- nan
The F1-Score is the harmonic mean of Precision and Recall. It balances both metrics and is particularly useful when you need to seek a balance between Precision and Recall and there is an uneven class distribution.
In Simple Linear Regression, the method of _________ is often used to estimate the coefficients.
- Clustering
- Gradient Descent
- Least Squares
- Neural Networks
The method of least squares is commonly used in Simple Linear Regression to estimate the coefficients by minimizing the sum of squared errors.