We create databases and tables to organize and store data in a structured way. This helps us find and use data easily later.
0
0
Creating databases and tables in Hadoop
Introduction
When starting a new project and you need a place to save your data.
When you want to separate data for different teams or purposes.
When you want to store data in a format that is easy to analyze.
When you need to manage large amounts of data efficiently.
When you want to control who can see or change the data.
Syntax
Hadoop
CREATE DATABASE database_name; CREATE TABLE table_name ( column1_name column1_type, column2_name column2_type, ... ) STORED AS file_format;
The CREATE DATABASE command makes a new database.
The CREATE TABLE command makes a new table inside a database.
Examples
This creates a new database called sales_data.
Hadoop
CREATE DATABASE sales_data;
This creates a table customers in the sales_data database with three columns and stores data in Parquet format.
Hadoop
CREATE TABLE sales_data.customers ( id INT, name STRING, age INT ) STORED AS PARQUET;
This creates a table employees with three columns and stores data in ORC format.
Hadoop
CREATE TABLE employees ( emp_id INT, emp_name STRING, salary FLOAT ) STORED AS ORC;
Sample Program
This code creates a database called company. Then it creates a table employees inside it with four columns. Finally, it lists all databases and tables in the company database.
Hadoop
CREATE DATABASE company; CREATE TABLE company.employees ( emp_id INT, emp_name STRING, department STRING, salary FLOAT ) STORED AS PARQUET; SHOW DATABASES; SHOW TABLES IN company;
OutputSuccess
Important Notes
Database and table names should be unique and meaningful.
Choose the file format (like PARQUET or ORC) based on your data size and query needs.
Use SHOW DATABASES; and SHOW TABLES IN database_name; to check what you created.
Summary
Databases help organize data into separate groups.
Tables store data in rows and columns inside databases.
Creating databases and tables is the first step to working with data in Hadoop.