A machine learning model trained for predicting whether an email is spam or not has a very high accuracy of 99%. However, almost all emails (including non-spam) are classified as non-spam by the model. What could be a potential issue with relying solely on accuracy in this case?

Data Imbalance
Lack of Feature Engineering
Overfitting
Underfitting

The issue here is data imbalance, where the model is heavily biased toward the majority class (non-spam). Relying solely on accuracy in imbalanced datasets can be misleading as it doesn't account for the misclassification of the minority class (spam), which is a significant problem.

Add your answer