Recall & Review
beginner
What is the main difference between Apache Spark and Hadoop MapReduce in terms of data processing?
Apache Spark processes data in-memory, making it faster, while Hadoop MapReduce reads and writes data to disk between each step, which is slower.
Click to reveal answer
intermediate
How does Spark handle iterative algorithms compared to Hadoop MapReduce?
Spark keeps data in memory across iterations, speeding up iterative algorithms, whereas Hadoop MapReduce reloads data from disk each time, slowing down the process.
Click to reveal answer
beginner
Which framework supports real-time stream processing: Spark or Hadoop MapReduce?
Apache Spark supports real-time stream processing through Spark Streaming, while Hadoop MapReduce is designed mainly for batch processing.
Click to reveal answer
intermediate
What programming languages can you use with Apache Spark that are not natively supported by Hadoop MapReduce?
Apache Spark supports Scala, Python, Java, and R, while Hadoop MapReduce primarily supports Java.
Click to reveal answer
advanced
Why might a company choose Hadoop MapReduce over Spark despite Spark's speed advantages?
Hadoop MapReduce can be more cost-effective for very large batch jobs on disk-based storage and has a mature ecosystem; also, it requires less memory than Spark.
Click to reveal answer
Which of the following best describes Apache Spark's data processing?
✗ Incorrect
Apache Spark processes data in memory, which speeds up computation compared to disk-based methods.
Hadoop MapReduce is best suited for which type of processing?
✗ Incorrect
Hadoop MapReduce is designed primarily for batch processing of large datasets.
Which language is NOT natively supported by Hadoop MapReduce but is supported by Spark?
✗ Incorrect
Spark supports Scala and Python natively, while Hadoop MapReduce mainly supports Java.
What feature allows Spark to speed up iterative algorithms?
✗ Incorrect
Spark stores data in memory across iterations, reducing disk reads and speeding up iterative tasks.
Which framework would be better for a company with limited memory resources?
✗ Incorrect
Hadoop MapReduce uses disk storage and requires less memory, making it better for limited memory environments.
Explain the main performance differences between Apache Spark and Hadoop MapReduce.
Think about how each framework handles data during processing.
You got /4 concepts.
Describe scenarios where Hadoop MapReduce might be preferred over Apache Spark.
Consider resource availability and job types.
You got /4 concepts.