Recall & Review
beginner
What is Hadoop primarily used for?
Hadoop is mainly used for storing and processing large data sets using a distributed file system called HDFS and batch processing with MapReduce.
Click to reveal answer
beginner
How does Spark process data differently from Hadoop MapReduce?
Spark processes data in-memory, which makes it faster for iterative tasks and real-time processing compared to Hadoop MapReduce that reads and writes data to disk between steps.
Click to reveal answer
beginner
Which system supports real-time data processing better: Hadoop or Spark?
Spark supports real-time data processing better because it can handle streaming data and perform computations quickly using in-memory processing.
Click to reveal answer
beginner
What is the main storage system used by Hadoop?
Hadoop uses HDFS (Hadoop Distributed File System) to store data across many machines in a cluster.
Click to reveal answer
intermediate
Why might Spark be preferred over Hadoop for machine learning tasks?
Spark is preferred because it can quickly process data in-memory and has built-in libraries like MLlib for machine learning, making it faster and easier to use for iterative algorithms.
Click to reveal answer
Which of the following is a key feature of Spark compared to Hadoop?
✗ Incorrect
Spark processes data in-memory, which makes it faster than Hadoop's disk-based MapReduce.
What does Hadoop primarily use to store data?
✗ Incorrect
Hadoop stores data in HDFS, a distributed file system.
Which system is better suited for real-time data processing?
✗ Incorrect
Spark supports real-time streaming and in-memory processing, making it better for real-time tasks.
Which processing model does Hadoop MapReduce follow?
✗ Incorrect
Hadoop MapReduce processes data in batches, reading and writing to disk between steps.
Why is Spark faster than Hadoop MapReduce for iterative algorithms?
✗ Incorrect
Spark keeps data in memory during processing, speeding up iterative tasks like machine learning.
Explain the main differences between Hadoop and Spark in terms of data processing and speed.
Think about how each system handles data during processing.
You got /4 concepts.
Describe scenarios where you would choose Hadoop over Spark and vice versa.
Consider the type of data processing and speed requirements.
You got /3 concepts.