0
0
Hadoopdata~5 mins

Input splits and data locality in Hadoop - Cheat Sheet & Quick Revision

Choose your learning style9 modes available
Recall & Review
beginner
What is an input split in Hadoop?
An input split is a chunk of data that Hadoop divides from the input files to process in parallel. Each split is processed by one map task.
Click to reveal answer
beginner
Why is data locality important in Hadoop?
Data locality means running tasks on the same machine or close to where the data is stored. It reduces network traffic and speeds up processing.
Click to reveal answer
intermediate
How does Hadoop decide the size of an input split?
Hadoop uses the HDFS block size as a guide but can adjust split size based on configuration and file format to balance load and efficiency.
Click to reveal answer
intermediate
What happens if a map task runs on a node without the data it needs?
The task will read data over the network from another node, which slows down processing and increases network load.
Click to reveal answer
beginner
Explain the relationship between input splits and map tasks.
Each input split is assigned to one map task. The map task processes the data in that split independently.
Click to reveal answer
What does an input split represent in Hadoop?
AA Hadoop configuration file
BA network packet
CA user request
DA chunk of data processed by one map task
Why is data locality beneficial in Hadoop?
AIt reduces network traffic and speeds up processing
BIt increases disk usage
CIt slows down the job
DIt requires more memory
What guides the default size of an input split?
AUser's RAM size
BCPU speed
CHDFS block size
DNumber of map tasks
If a map task runs on a node without the data, what happens?
AThe task fails immediately
BData is read over the network, slowing processing
CThe task runs faster
DThe data is copied to the node automatically
How many map tasks are created per input split?
AOne map task per input split
BMultiple map tasks per split
CNo map tasks
DDepends on the reduce tasks
Describe what input splits are and why they matter in Hadoop processing.
Think about how Hadoop breaks big data into smaller parts.
You got /3 concepts.
    Explain data locality and how it affects the speed of Hadoop jobs.
    Consider why running tasks near data is faster.
    You got /3 concepts.