In Crunch, a ____ is used to represent a distributed dataset in Hadoop.

  • PCollection
  • PGroupedTable
  • PObject
  • PTable
In Crunch, a PCollection is used to represent a distributed dataset in Hadoop. It is a parallel collection of data, and Crunch provides a high-level Java API for building data processing pipelines.
Add your answer
Loading...

Leave a comment

Your email address will not be published. Required fields are marked *