0
0
Hadoopdata~10 mins

Hive architecture in Hadoop - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Hive architecture
User submits Hive query
Hive Driver parses query
Compiler converts query to execution plan
Metastore provides metadata
Execution Engine runs plan
Hadoop executes MapReduce or Tez jobs
Results returned to User
This flow shows how a Hive query moves from user input through parsing, planning, execution, and finally returning results using Hadoop components.
Execution Sample
Hadoop
User submits: SELECT * FROM sales WHERE year=2023;
Driver parses query
Compiler creates execution plan
Metastore provides table info
Execution Engine runs plan
Hadoop runs MapReduce job
Results returned
This example traces a simple Hive query execution from submission to result.
Execution Table
StepActionComponentDetailsOutput
1Submit queryUserSELECT * FROM sales WHERE year=2023;Query string
2Parse queryDriverCheck syntax and semanticsParsed query tree
3Compile queryCompilerCreate execution plan (logical and physical)Execution plan
4Fetch metadataMetastoreGet table schema and partitionsMetadata info
5Execute planExecution EngineRun MapReduce or Tez jobs on HadoopJob execution
6Process dataHadoopMap and Reduce tasks process dataIntermediate results
7Return resultsExecution EngineCollect and send results to userQuery results
8EndSystemQuery completedResults delivered
💡 Query execution ends after results are returned to the user.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4After Step 5After Step 7Final
querySELECT * FROM sales WHERE year=2023;Parsed query treeExecution planMetadata infoJob execution startedResults collectedResults delivered
Key Moments - 3 Insights
Why does Hive need the Metastore during query execution?
The Metastore provides the schema and partition info needed to plan and run the query correctly, as shown in step 4 of the execution_table.
What happens if the query syntax is wrong?
The Driver fails at step 2 during parsing, so the query does not proceed to compilation or execution.
How does Hive execute the query on data?
Hive converts the query into MapReduce or Tez jobs run by Hadoop, as shown in steps 5 and 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, at which step does Hive get the table schema?
AStep 4
BStep 2
CStep 5
DStep 7
💡 Hint
Check the 'Fetch metadata' action in the execution_table at step 4.
According to variable_tracker, what is the state of 'query' after step 3?
ARaw query string
BExecution plan
CParsed query tree
DResults delivered
💡 Hint
Look at the 'After Step 3' column for 'query' in variable_tracker.
If the query syntax is incorrect, which step will fail according to execution_table?
AStep 1
BStep 5
CStep 2
DStep 7
💡 Hint
Parsing happens at step 2; syntax errors stop execution there.
Concept Snapshot
Hive architecture flow:
User submits query -> Driver parses -> Compiler plans -> Metastore provides metadata -> Execution Engine runs jobs on Hadoop -> Results returned.
Metastore is key for schema info.
Execution uses MapReduce or Tez.
Driver stops on syntax errors.
Full Transcript
Hive architecture starts when a user submits a query. The Driver parses the query to check syntax and semantics. Then the Compiler creates an execution plan. The Metastore provides metadata like table schema and partitions. The Execution Engine runs the plan by launching MapReduce or Tez jobs on Hadoop. These jobs process data and produce results. Finally, results are returned to the user. If the query has syntax errors, parsing fails and execution stops early. The Metastore is essential for providing table information needed to run queries correctly.