Hadoopdata~10 mins

Pig Latin basics in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Pig Latin basics

Load Data

↓

Apply Transformations

↓

Filter or Group Data

↓

Generate Output

↓

Store or Dump Results

Pig Latin scripts start by loading data, then apply transformations like filtering or grouping, and finally output the results.

Execution Sample

Hadoop

A = LOAD 'data.txt' USING PigStorage(',');
B = FILTER A BY $0 > 10;
C = GROUP B BY $1;
D = FOREACH C GENERATE group, COUNT(B);
DUMP D;

This script loads data, filters rows where first column > 10, groups by second column, counts rows per group, and shows results.

Execution Table

Step	Action	Input Data Snapshot	Result	Notes
1	LOAD data.txt	Raw file with rows like (12,apple), (6,banana)	Relation A with all rows	Data loaded into relation A
2	FILTER A BY $0 > 10	Relation A with rows (12,apple), (6,banana)	Relation B with rows (12,apple)	Only rows with first column > 10 kept
3	GROUP B BY $1	Relation B with (12,apple)	Relation C grouped by second column 'apple'	Rows grouped by second column
4	FOREACH C GENERATE group, COUNT(B)	Grouped relation C	Relation D with ('apple', 1)	Count of rows per group calculated
5	DUMP D	Relation D	Output: ('apple', 1)	Results displayed on screen

💡 All steps completed, final output dumped

Variable Tracker

Relation	Start	After Step 1	After Step 2	After Step 3	After Step 4	Final
A	empty	All rows from data.txt	All rows from data.txt	All rows from data.txt	All rows from data.txt	All rows from data.txt
B	empty	empty	Rows with $0 > 10	Rows with $0 > 10	Rows with $0 > 10	Rows with $0 > 10
C	empty	empty	empty	Grouped by $1	Grouped by $1	Grouped by $1
D	empty	empty	empty	empty	Group counts	Group counts

Key Moments - 3 Insights

Why does FILTER keep only some rows and not all?

What does GROUP BY do to the data?

Why do we use FOREACH after GROUP BY?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table at step 2. What rows does relation B contain?

ARows where second column is 'apple'

BRows where first column is greater than 10

CAll rows from the file

DEmpty relation

Concept Snapshot

Pig Latin basics:
- LOAD reads data into a relation
- FILTER keeps rows matching a condition
- GROUP BY collects rows by key
- FOREACH processes each group
- DUMP shows results
Simple steps to transform big data

Full Transcript

This visual trace shows how a Pig Latin script runs step-by-step. First, data is loaded from a file into a relation named A. Then, a FILTER keeps only rows where the first column is greater than 10, creating relation B. Next, the data in B is grouped by the second column, forming relation C. After grouping, FOREACH generates a new relation D with the group key and count of rows in each group. Finally, DUMP outputs the results. Variables change as data moves through these steps. Beginners often wonder why FILTER removes rows or how GROUP BY works; this trace clarifies those by showing exact data changes. The quiz tests understanding of these steps and their effects on data.