Which component of the Hadoop ecosystem is primarily used for distributed data storage?
- HDFS (Hadoop Distributed File System)
- Apache Spark
- MapReduce
- Hive
HDFS (Hadoop Distributed File System) is the primary component in the Hadoop ecosystem for distributed data storage. It is designed to store large files across multiple machines and provides data durability and fault tolerance.
Loading...
Related Quiz
- In the context of outlier detection, what is the commonly used plot to visually detect outliers in a single variable?
- You're analyzing a dataset with the heights of individuals. While the mean height is 165 cm, you notice a few heights recorded as 500 cm. These values are likely:
- When handling outliers in a dataset with skewed distributions, which measure of central tendency is preferred for imputation?
- In a relational database, what is used to ensure data integrity across multiple tables?
- In transfer learning, a model trained on a large dataset is used as a starting point, and the knowledge gained is transferred to a new, _______ task.