0
0
Hadoopdata~10 mins

Pig vs Hive comparison in Hadoop - Visual Side-by-Side Comparison

Choose your learning style9 modes available
Concept Flow - Pig vs Hive comparison
Start: Data in Hadoop
Pig Latin Script
Pig Execution Engine
MapReduce Jobs
Data Processed
Results Returned
Shows how Pig and Hive take different scripts or queries, convert them to MapReduce jobs, and process data in Hadoop.
Execution Sample
Hadoop
Pig: A = LOAD 'data' AS (name, age);
B = FILTER A BY age > 30;
DUMP B;

Hive: SELECT name, age FROM data WHERE age > 30;
Pig script loads data, filters by age > 30, and shows results; Hive query selects same data with condition.
Execution Table
StepToolActionInputOutputNotes
1PigLOAD data'data' fileRelation A with (name, age)Reads raw data into Pig relation
2PigFILTER A BY age > 30Relation ARelation B with filtered rowsFilters rows where age > 30
3PigDUMP BRelation BPrint filtered rowsOutputs filtered data to console
4HiveParse QuerySELECT name, age FROM data WHERE age > 30Logical planHive parses SQL-like query
5HiveCompile to MapReduceLogical planMapReduce jobsHive converts query to MapReduce
6HiveExecute MapReduceMapReduce jobsFiltered data outputRuns jobs and returns results
7BothMapReduce ExecutionJobs from Pig or HiveProcessed dataBoth use MapReduce under the hood
8EndProcess completeProcessed dataResults returnedData ready for user
💡 Both Pig and Hive finish after MapReduce jobs process data and return results.
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5After Step 6Final
Pig Relation AemptyLoaded dataLoaded dataLoaded dataN/AN/AN/AN/A
Pig Relation BN/AN/AFiltered data (age > 30)Filtered dataN/AN/AN/AN/A
Hive QueryRaw SQLRaw SQLRaw SQLRaw SQLLogical planMapReduce jobsResultsResults
Key Moments - 3 Insights
Why does Pig use a script while Hive uses SQL-like queries?
Pig uses Pig Latin, a procedural script language good for step-by-step data flow (see execution_table steps 1-3). Hive uses HiveQL, a declarative SQL-like language that describes what data to get (see steps 4-6).
Do Pig and Hive run on different engines?
No, both Pig and Hive convert their scripts or queries into MapReduce jobs that run on the same Hadoop engine (see execution_table step 7).
Which tool is better for complex data transformations?
Pig is often better for complex procedural data flows because you write step-by-step scripts (see Pig steps 1-3). Hive is better for SQL users and simpler queries (see Hive steps 4-6).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what is the output after Pig's FILTER step?
ARelation A with all data
BRelation B with filtered rows where age > 30
CMapReduce jobs
DParsed Hive query
💡 Hint
Check execution_table row 2 under Output column.
At which step does Hive convert the query into MapReduce jobs?
AStep 5
BStep 4
CStep 6
DStep 3
💡 Hint
Look at execution_table row 5 under Action column.
If Pig did not filter data, what would be the output after step 2?
AFiltered data with age > 30
BEmpty relation
CRelation A with all loaded data
DMapReduce jobs
💡 Hint
Refer to variable_tracker for Pig Relation B after step 2.
Concept Snapshot
Pig vs Hive Comparison:
- Pig uses Pig Latin scripts (procedural).
- Hive uses HiveQL queries (declarative SQL).
- Both convert code to MapReduce jobs.
- Pig is good for complex data flows.
- Hive is good for SQL users and simple queries.
- Both run on Hadoop's MapReduce engine.
Full Transcript
This visual compares Pig and Hive in Hadoop. Pig uses scripts called Pig Latin to load, filter, and output data step-by-step. Hive uses SQL-like queries to select data. Both tools convert their code into MapReduce jobs that run on Hadoop to process data. Pig is procedural and good for complex flows. Hive is declarative and suits SQL users. The execution table shows each step Pig and Hive take, from loading data to returning results. Variable tracking shows how data changes in Pig relations and Hive queries. Key moments clarify why Pig uses scripts and Hive uses SQL, and that both run on the same engine. The quiz tests understanding of outputs and steps in the process.