A machine learning model trained for predicting whether an email is spam or not has a very high accuracy of 99%. However, almost all emails (including non-spam) are classified as non-spam by the model. What could be a potential issue with relying solely on accuracy in this case?
- Data Imbalance
- Lack of Feature Engineering
- Overfitting
- Underfitting
The issue here is data imbalance, where the model is heavily biased toward the majority class (non-spam). Relying solely on accuracy in imbalanced datasets can be misleading as it doesn't account for the misclassification of the minority class (spam), which is a significant problem.
Loading...
Related Quiz
- How do the generator and discriminator components of a GAN interact during training?
- In K-means clustering, the value of K represents the number of ________.
- Ensuring that a machine learning model does not unintentionally favor or discriminate against certain groups is ensuring its ________.
- An online retailer wants to create a hierarchical structure of product categories based on product descriptions and features. They want this hierarchy to be easily interpretable and visual. Which clustering approach would be most suitable?
- In the context of PCA, what do the principal components represent?