Scenario: You are tasked with cleansing a dataset containing customer information. How would you handle missing values in the "Age" column?
- Flag missing values for further investigation
- Impute missing values based on other demographic information
- Remove rows with missing age values
- Replace missing values with the mean or median age
When handling missing values in the "Age" column, one approach is to impute the missing values based on other demographic information such as gender, location, or income. This method utilizes existing data patterns to estimate the missing values more accurately. Replacing missing values with the mean or median can skew the distribution, while removing rows may result in significant data loss. Flagging missing values for further investigation allows for manual review or additional data collection if necessary.
Loading...
Related Quiz
- What are some challenges commonly faced during the data loading phase of the ETL process?
- Apache ________ is a distributed messaging system commonly used for building real-time data pipelines and streaming applications.
- What are some common tools or frameworks used for building batch processing pipelines?
- In a physical data model, what aspects of the database system are typically considered, which are not part of the conceptual or logical models?
- In an ERD, what does a relationship line between two entities represent?