A machine learning model trained for predicting whether an email is spam or not has a very high accuracy of 99%. However, almost all emails (including non-spam) are classified as non-spam by the model. What could be a potential issue with relying solely on accuracy in this case?

  • Data Imbalance
  • Lack of Feature Engineering
  • Overfitting
  • Underfitting
The issue here is data imbalance, where the model is heavily biased toward the majority class (non-spam). Relying solely on accuracy in imbalanced datasets can be misleading as it doesn't account for the misclassification of the minority class (spam), which is a significant problem.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *