Hadoopdata~10 mins

Why Hive enables SQL on Hadoop - Visual Breakdown

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Concept Flow - Why Hive enables SQL on Hadoop

User writes SQL query

↓

Hive translates SQL to MapReduce jobs

↓

MapReduce jobs run on Hadoop cluster

↓

Data processed in HDFS

↓

Results collected and returned to user

Hive lets users write SQL queries that it converts into Hadoop MapReduce jobs to process big data stored in HDFS, then returns the results.

Execution Sample

Hadoop

SELECT name, age FROM users WHERE age > 30;

This SQL query asks Hive to find names and ages of users older than 30 from data stored in Hadoop.

Execution Table

Step	Action	Input	Output	Notes
1	User submits SQL query	SELECT name, age FROM users WHERE age > 30;	SQL query string	User writes SQL to get data
2	Hive parses SQL	SQL query string	Query plan	Hive understands query structure
3	Hive compiles to MapReduce jobs	Query plan	MapReduce job(s)	Transforms SQL to Hadoop jobs
4	MapReduce jobs run on Hadoop	MapReduce job(s)	Processed data	Jobs process data in HDFS
5	Results collected	Processed data	Query results	Data filtered and selected
6	Results returned to user	Query results	Final output	User gets requested data

💡 All MapReduce jobs complete and results returned to user

Variable Tracker

Variable	Start	After Step 2	After Step 3	After Step 4	After Step 5	Final
SQL Query	User input	Parsed	Compiled to jobs	Jobs running	Data processed	Results ready
Query Plan	None	Created	Used to create jobs	N/A	N/A	N/A
MapReduce Jobs	None	None	Created	Running	Completed	N/A
Data in HDFS	Stored data	Unchanged	Unchanged	Processed	Filtered	N/A
Results	None	None	None	None	Collected	Returned

Key Moments - 3 Insights

Why does Hive convert SQL queries into MapReduce jobs?

Does Hive store data itself or use Hadoop storage?

How does Hive return results to the user?

Visual Quiz - 3 Questions

Test your understanding

Look at the execution_table, at which step does Hive convert SQL into MapReduce jobs?

AStep 2

BStep 4

CStep 3

DStep 5

Concept Snapshot

Hive lets you use SQL to query big data on Hadoop.
It translates SQL queries into MapReduce jobs.
These jobs run on Hadoop's storage system (HDFS).
Results are collected and sent back to the user.
This makes big data easier to query with familiar SQL.

Full Transcript

Hive enables SQL on Hadoop by letting users write SQL queries. Hive then parses these queries and converts them into MapReduce jobs that run on the Hadoop cluster. The data is stored in Hadoop's file system (HDFS) and processed by these jobs. After processing, Hive collects the results and returns them to the user. This process allows users to work with big data using simple SQL commands without needing to write complex MapReduce code.