For a Hadoop-based ETL process, how would you select the appropriate file format and compression codec for optimized data transfer?

  • Avro with LZO
  • ORC with Gzip
  • SequenceFile with Bzip2
  • TextFile with Snappy
In a Hadoop-based ETL process, choosing ORC (Optimized Row Columnar) file format with Gzip compression is ideal for optimized data transfer. ORC provides efficient storage and Gzip offers a good balance between compression ratio and speed.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *