While working with a dataset about car sales, you discover that the "Brand" column has many brands with very low frequency. To avoid having too many sparse categories, which technique can you apply to the "Brand" column?

  • One-Hot Encoding
  • Label Encoding
  • Brand grouping based on frequency
  • Principal Component Analysis (PCA)
To handle low-frequency categories in the "Brand" column, you can group the brands based on their frequency. This reduces the number of sparse categories and can improve model performance. You can also consider techniques like label encoding or one-hot encoding, but they might not be ideal for low-frequency categories. PCA is used for dimensionality reduction and not for handling categorical variables.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *