While preparing data for a machine learning model, you realize that the 'Height' column has some missing values. Upon closer inspection, you find that these missing values often correspond to records where the 'Age' column has values less than 1 year. What might be a reasonable way to handle these missing values?

  • Impute missing values with the mean height
  • Impute missing values with 0
  • Leave missing values as they are
  • Impute missing values based on 'Age'
In this case, it might be reasonable to leave missing values as they are. Imputing with the mean height or 0 may introduce bias, and imputing based on 'Age' should be done carefully, as infants may have different height characteristics than adults. Depending on the context and dataset size, leaving the missing values untouched might be the best choice.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *