0
0
Hadoopdata~30 mins

Why Hive enables SQL on Hadoop - See It in Action

Choose your learning style9 modes available
Why Hive Enables SQL on Hadoop
📖 Scenario: Imagine you work in a company that collects a huge amount of data every day. This data is stored in Hadoop, a system designed to handle big data across many computers. But you want to analyze this data using SQL, a language you already know well.
🎯 Goal: In this project, you will create a simple example to understand how Hive allows you to use SQL queries on data stored in Hadoop. You will set up a small dataset, configure a Hive table, write a SQL query, and see the output.
📋 What You'll Learn
Create a dataset representing sales data
Configure a Hive table to read the dataset
Write a SQL query to select specific data
Display the query result
💡 Why This Matters
🌍 Real World
Companies use Hive to analyze large datasets stored in Hadoop using familiar SQL commands without needing to write complex code.
💼 Career
Knowing Hive and SQL on Hadoop is valuable for data analysts and engineers working with big data platforms.
Progress0 / 4 steps
1
DATA SETUP: Create a sales data file
Create a text file named sales_data.txt with these exact lines:
1,2024-01-01,100
2,2024-01-02,150
3,2024-01-03,200
Hadoop
Need a hint?

Use a text editor or command line to create sales_data.txt with the exact lines.

2
CONFIGURATION: Create a Hive table for the sales data
Write a Hive SQL command to create a table named sales with columns id INT, date STRING, and amount INT using FIELDS TERMINATED BY ','.
Hadoop
Need a hint?

Use CREATE TABLE sales (id INT, date STRING, amount INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

3
CORE LOGIC: Load data and query the sales table
Write Hive commands to load data from sales_data.txt into the sales table and then write a SQL query to select all rows where amount > 120.
Hadoop
Need a hint?

Use LOAD DATA LOCAL INPATH 'sales_data.txt' INTO TABLE sales; and then SELECT * FROM sales WHERE amount > 120;

4
OUTPUT: Display the query result
Print the result of the query which should show rows with amount greater than 120.
Hadoop
Need a hint?

The output should list the rows where amount is greater than 120.