You have a dataset where the variable 'age' has a few instances of '150', which is an obvious data entry error. What would be the most suitable method to handle these outliers?
- Removal
- Binning
- Transformation
- nan
In this case, removal is the best option as these data points clearly result from data entry errors and don't represent real ages.
Loading...
Related Quiz
- In a study on job satisfaction, employees with lower satisfaction scores are less likely to complete surveys. How would you categorize this missing data?
- Suppose you need to create a static visualization that will be printed in a scientific journal, which Python library would you prefer to use?
- In the EDA process, what does 'wrangling' refer to?
- What is the process of removing an entire row when any single data point within it is missing called?
- How does the choice of model in a model-based method impact the imputation process?