For large-scale data processing in Hadoop, which file format is preferred for its efficiency and performance?

  • AVRO
  • ORC
  • Parquet
  • SequenceFile
Parquet is the preferred file format for large-scale data processing in Hadoop due to its columnar storage, compression techniques, and schema evolution support. It offers high performance for analytical queries and is well-suited for data warehouse applications.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *