For a Hadoop-based ETL process, how would you select the appropriate file format and compression codec for optimized data transfer?
- Avro with LZO
- ORC with Gzip
- SequenceFile with Bzip2
- TextFile with Snappy
In a Hadoop-based ETL process, choosing ORC (Optimized Row Columnar) file format with Gzip compression is ideal for optimized data transfer. ORC provides efficient storage and Gzip offers a good balance between compression ratio and speed.
Loading...
Related Quiz
- To manage and optimize large-scale data warehousing, Hive integrates with ____ for workflow scheduling.
- For log file processing in Hadoop, the ____ InputFormat is typically used.
- The ____ tool in Hadoop is used for simulating cluster conditions on a single machine for testing.
- In Hadoop, the ____ compression codec is often used for its splittable property, allowing efficient parallel processing.
- ____ is the process by which Hadoop ensures that a user or service is actually who they claim to be.