beginner

What is an input split in Hadoop?

An input split is a chunk of data that Hadoop divides from the input files to process in parallel. Each split is processed by one map task.

Click to reveal answer

beginner

Why is data locality important in Hadoop?

Data locality means running tasks on the same machine or close to where the data is stored. It reduces network traffic and speeds up processing.

Click to reveal answer

intermediate

How does Hadoop decide the size of an input split?

Hadoop uses the HDFS block size as a guide but can adjust split size based on configuration and file format to balance load and efficiency.

Click to reveal answer

intermediate

What happens if a map task runs on a node without the data it needs?

The task will read data over the network from another node, which slows down processing and increases network load.

Click to reveal answer

beginner

Explain the relationship between input splits and map tasks.

Each input split is assigned to one map task. The map task processes the data in that split independently.

Click to reveal answer

What does an input split represent in Hadoop?

AA Hadoop configuration file

BA network packet

CA user request

DA chunk of data processed by one map task

Why is data locality beneficial in Hadoop?

AIt reduces network traffic and speeds up processing

BIt increases disk usage

CIt slows down the job

DIt requires more memory

What guides the default size of an input split?

AUser's RAM size

BCPU speed

CHDFS block size

DNumber of map tasks

If a map task runs on a node without the data, what happens?

AThe task fails immediately

BData is read over the network, slowing processing

CThe task runs faster

DThe data is copied to the node automatically

How many map tasks are created per input split?

AOne map task per input split

BMultiple map tasks per split

CNo map tasks

DDepends on the reduce tasks

Describe what input splits are and why they matter in Hadoop processing.

Explain data locality and how it affects the speed of Hadoop jobs.