For large-scale data processing in Hadoop, which file format is preferred for its efficiency and performance?

AVRO
ORC
Parquet
SequenceFile

Parquet is the preferred file format for large-scale data processing in Hadoop due to its columnar storage, compression techniques, and schema evolution support. It offers high performance for analytical queries and is well-suited for data warehouse applications.

Add your answer