beginner

What is Hadoop primarily used for?

Hadoop is mainly used for storing and processing large data sets using a distributed file system called HDFS and batch processing with MapReduce.

Click to reveal answer

beginner

How does Spark process data differently from Hadoop MapReduce?

Spark processes data in-memory, which makes it faster for iterative tasks and real-time processing compared to Hadoop MapReduce that reads and writes data to disk between steps.

Click to reveal answer

beginner

Which system supports real-time data processing better: Hadoop or Spark?

Spark supports real-time data processing better because it can handle streaming data and perform computations quickly using in-memory processing.

Click to reveal answer

beginner

What is the main storage system used by Hadoop?

Hadoop uses HDFS (Hadoop Distributed File System) to store data across many machines in a cluster.

Click to reveal answer

intermediate

Why might Spark be preferred over Hadoop for machine learning tasks?

Spark is preferred because it can quickly process data in-memory and has built-in libraries like MLlib for machine learning, making it faster and easier to use for iterative algorithms.

Click to reveal answer

Which of the following is a key feature of Spark compared to Hadoop?

AOnly batch processing

BUses MapReduce only

CIn-memory data processing

DNo support for streaming data

What does Hadoop primarily use to store data?

ASpark SQL

BIn-memory cache

CNoSQL databases

DHDFS

Which system is better suited for real-time data processing?

AHadoop MapReduce

BSpark

CBoth are equal

DNeither supports real-time

Which processing model does Hadoop MapReduce follow?

ABatch processing

BStreaming processing

CIn-memory processing

DGraph processing

Why is Spark faster than Hadoop MapReduce for iterative algorithms?

ABecause it processes data in-memory

BBecause it uses disk storage

CBecause it uses MapReduce

DBecause it does not support machine learning

Explain the main differences between Hadoop and Spark in terms of data processing and speed.

Describe scenarios where you would choose Hadoop over Spark and vice versa.