0
0
HadoopHow-ToBeginner ยท 3 min read

How to Query Data in Hive in Hadoop: Simple Guide

To query data in Hive on Hadoop, use HiveQL, a SQL-like language. You write queries with commands like SELECT, FROM, and WHERE to retrieve data from Hive tables stored in Hadoop.
๐Ÿ“

Syntax

Hive queries use a SQL-like syntax called HiveQL. The basic query structure is:

  • SELECT: Choose columns to display.
  • FROM: Specify the table to query.
  • WHERE: Filter rows based on conditions.
  • LIMIT: Limit the number of rows returned.
sql
SELECT column1, column2 FROM table_name WHERE condition LIMIT 10;
๐Ÿ’ป

Example

This example shows how to query a Hive table named employees to get the names and salaries of employees who earn more than 50000.

sql
SELECT name, salary FROM employees WHERE salary > 50000 LIMIT 5;
Output
name salary Alice 70000 Bob 65000 Carol 72000 David 80000 Eve 90000
โš ๏ธ

Common Pitfalls

Common mistakes when querying Hive include:

  • Using case-sensitive table or column names incorrectly (Hive is case-insensitive by default).
  • Forgetting to end queries with a semicolon ;.
  • Not loading data into Hive tables before querying.
  • Using unsupported SQL functions or syntax.

Always check that your table exists and data is loaded before running queries.

sql
/* Wrong: missing semicolon */
SELECT * FROM employees

/* Right: ends with semicolon */
SELECT * FROM employees;
๐Ÿ“Š

Quick Reference

CommandDescription
SELECTChoose columns to display
FROMSpecify the table to query
WHEREFilter rows by condition
LIMITLimit number of rows returned
ORDER BYSort results by column
GROUP BYGroup rows for aggregation
โœ…

Key Takeaways

Use HiveQL, a SQL-like language, to query data in Hive on Hadoop.
Always specify the table with FROM and columns with SELECT.
Use WHERE to filter data and LIMIT to restrict output size.
End queries with a semicolon to avoid syntax errors.
Ensure data is loaded into Hive tables before querying.