You are working with a dataset where city names have been entered in various formats (e.g., "NYC," "New York City," "New York"). To standardize these entries, which data cleaning technique would be most appropriate?
- Data Imputation
- Data Normalization
- One-Hot Encoding
- String Matching
When dealing with diverse formats of city names, string matching is the most suitable data cleaning technique. It involves comparing and matching strings to standardize them. This ensures that all variations of city names are transformed into a consistent format, making data analysis and aggregation more straightforward.
Loading...
Related Quiz
- In a top-down approach to building a data infrastructure, which is typically built first?
- A common practice in data warehousing to ensure consistency and to improve join performance is to use _______ keys in fact tables.
- The methodology that emphasizes a phased approach to deploying ERP solutions, where each phase is a stepping stone for the next, is called _______.
- In the context of cloud computing, what does "elasticity" refer to, especially concerning capacity planning and scalability?
- In ETL, the process of combining data from different sources and providing a unified view is known as data _______.