How to Query Data in Hive in Hadoop: Simple Guide
To query data in Hive on Hadoop, use
HiveQL, a SQL-like language. You write queries with commands like SELECT, FROM, and WHERE to retrieve data from Hive tables stored in Hadoop.Syntax
Hive queries use a SQL-like syntax called HiveQL. The basic query structure is:
SELECT: Choose columns to display.FROM: Specify the table to query.WHERE: Filter rows based on conditions.LIMIT: Limit the number of rows returned.
sql
SELECT column1, column2 FROM table_name WHERE condition LIMIT 10;
Example
This example shows how to query a Hive table named employees to get the names and salaries of employees who earn more than 50000.
sql
SELECT name, salary FROM employees WHERE salary > 50000 LIMIT 5;
Output
name salary
Alice 70000
Bob 65000
Carol 72000
David 80000
Eve 90000
Common Pitfalls
Common mistakes when querying Hive include:
- Using case-sensitive table or column names incorrectly (Hive is case-insensitive by default).
- Forgetting to end queries with a semicolon
;. - Not loading data into Hive tables before querying.
- Using unsupported SQL functions or syntax.
Always check that your table exists and data is loaded before running queries.
sql
/* Wrong: missing semicolon */ SELECT * FROM employees /* Right: ends with semicolon */ SELECT * FROM employees;
Quick Reference
| Command | Description |
|---|---|
| SELECT | Choose columns to display |
| FROM | Specify the table to query |
| WHERE | Filter rows by condition |
| LIMIT | Limit number of rows returned |
| ORDER BY | Sort results by column |
| GROUP BY | Group rows for aggregation |
Key Takeaways
Use HiveQL, a SQL-like language, to query data in Hive on Hadoop.
Always specify the table with FROM and columns with SELECT.
Use WHERE to filter data and LIMIT to restrict output size.
End queries with a semicolon to avoid syntax errors.
Ensure data is loaded into Hive tables before querying.