Concept Flow - Input splits and data locality
Start: Large Input File
Split Input into Chunks
Assign Splits to Nodes
Check Data Locality
Process Locally
Map Task
Reduce Task
Output Result
The large input file is split into chunks called input splits. Each split is assigned to a node, preferring nodes that already have the data (data locality). If data is local, processing is faster; otherwise, data is fetched over the network.