Hadoopdata~10 mins

Pig vs Hive comparison in Hadoop - Visual Side-by-Side Comparison

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Pig vs Hive comparison

Start: Data in Hadoop

↓

Pig Latin Script

↓

Pig Execution Engine

↓

MapReduce Jobs

↓

Data Processed

↓

Results Returned

Shows how Pig and Hive take different scripts or queries, convert them to MapReduce jobs, and process data in Hadoop.

Execution Sample

Hadoop

Pig: A = LOAD 'data' AS (name, age);
B = FILTER A BY age > 30;
DUMP B;

Hive: SELECT name, age FROM data WHERE age > 30;

Pig script loads data, filters by age > 30, and shows results; Hive query selects same data with condition.

Execution Table

Step	Tool	Action	Input	Output	Notes
1	Pig	LOAD data	'data' file	Relation A with (name, age)	Reads raw data into Pig relation
2	Pig	FILTER A BY age > 30	Relation A	Relation B with filtered rows	Filters rows where age > 30
3	Pig	DUMP B	Relation B	Print filtered rows	Outputs filtered data to console
4	Hive	Parse Query	SELECT name, age FROM data WHERE age > 30	Logical plan	Hive parses SQL-like query
5	Hive	Compile to MapReduce	Logical plan	MapReduce jobs	Hive converts query to MapReduce
6	Hive	Execute MapReduce	MapReduce jobs	Filtered data output	Runs jobs and returns results
7	Both	MapReduce Execution	Jobs from Pig or Hive	Processed data	Both use MapReduce under the hood
8	End	Process complete	Processed data	Results returned	Data ready for user

💡 Both Pig and Hive finish after MapReduce jobs process data and return results.

Variable Tracker

Variable	Start	After Step 1	After Step 2	After Step 3	After Step 4	After Step 5	After Step 6	Final
Pig Relation A	empty	Loaded data	Loaded data	Loaded data	N/A	N/A	N/A	N/A
Pig Relation B	N/A	N/A	Filtered data (age > 30)	Filtered data	N/A	N/A	N/A	N/A
Hive Query	Raw SQL	Raw SQL	Raw SQL	Raw SQL	Logical plan	MapReduce jobs	Results	Results

Key Moments - 3 Insights

Why does Pig use a script while Hive uses SQL-like queries?

Do Pig and Hive run on different engines?

Which tool is better for complex data transformations?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution table, what is the output after Pig's FILTER step?

ARelation A with all data

BRelation B with filtered rows where age > 30

CMapReduce jobs

DParsed Hive query

Concept Snapshot

Pig vs Hive Comparison:
- Pig uses Pig Latin scripts (procedural).
- Hive uses HiveQL queries (declarative SQL).
- Both convert code to MapReduce jobs.
- Pig is good for complex data flows.
- Hive is good for SQL users and simple queries.
- Both run on Hadoop's MapReduce engine.

Full Transcript

This visual compares Pig and Hive in Hadoop. Pig uses scripts called Pig Latin to load, filter, and output data step-by-step. Hive uses SQL-like queries to select data. Both tools convert their code into MapReduce jobs that run on Hadoop to process data. Pig is procedural and good for complex flows. Hive is declarative and suits SQL users. The execution table shows each step Pig and Hive take, from loading data to returning results. Variable tracking shows how data changes in Pig relations and Hive queries. Key moments clarify why Pig uses scripts and Hive uses SQL, and that both run on the same engine. The quiz tests understanding of outputs and steps in the process.