Apache Spark leverages a distributed storage system called ________ for fault-tolerant storage of RDDs.
- Apache HBase
- Cassandra
- HDFS
- S3
Apache Spark utilizes HDFS (Hadoop Distributed File System) for fault-tolerant storage of Resilient Distributed Datasets (RDDs). HDFS provides the necessary durability and fault tolerance required for distributed processing in Spark.
Loading...
Related Quiz
- NoSQL databases are often used in scenarios where the volume of data is ________, and the data structure is subject to frequent changes.
- What is the role of a consumer group in Kafka?
- What are some common challenges in implementing a data governance framework?
- Scenario: A colleague is facing memory-related issues with their Apache Spark job. What strategies would you suggest to optimize memory usage and improve job performance?
- Metadata management tools often use ________ to track changes in data lineage over time.