0
0
Hadoopdata~5 mins

External vs managed tables in Hadoop

Choose your learning style9 modes available
Introduction

We use tables to organize data in Hadoop. Managed and external tables help decide who controls the data files and what happens when we delete the table.

When you want Hadoop to fully manage your data files and clean them up automatically.
When you want to keep data files even after deleting the table in Hadoop.
When sharing data files across different Hadoop tools or clusters.
When you want to control where data files are stored outside Hadoop's default location.
Syntax
Hadoop
CREATE TABLE table_name (column1 TYPE, column2 TYPE, ...)
STORED AS file_format;
-- For external tables, add: LOCATION 'path_to_data';

Managed tables do not use the LOCATION clause; Hadoop stores data in its default folder.

External tables use LOCATION to point to data outside Hadoop's control.

Examples
This creates a managed table. Hadoop stores and manages the data files.
Hadoop
CREATE TABLE managed_table (
  id INT,
  name STRING
)
STORED AS PARQUET;
This creates an external table. Data files stay where you put them, outside Hadoop's default folder.
Hadoop
CREATE EXTERNAL TABLE external_table (
  id INT,
  name STRING
)
STORED AS PARQUET
LOCATION '/user/data/external_table/';
Sample Program

This example creates one managed and one external table, inserts data, and queries both. Managed table data is stored inside Hadoop's control. External table data is stored at the specified location.

Hadoop
CREATE TABLE managed_employees (
  emp_id INT,
  emp_name STRING
)
STORED AS TEXTFILE;

CREATE EXTERNAL TABLE external_employees (
  emp_id INT,
  emp_name STRING
)
STORED AS TEXTFILE
LOCATION '/user/hadoop/external_employees/';

-- After creating, insert some data
INSERT INTO managed_employees VALUES (1, 'Alice');
INSERT INTO external_employees VALUES (2, 'Bob');

-- Query both tables
SELECT * FROM managed_employees;
SELECT * FROM external_employees;
OutputSuccess
Important Notes

Deleting a managed table deletes its data files automatically.

Deleting an external table only deletes the table metadata, not the data files.

Use external tables to share data across different systems without moving files.

Summary

Managed tables let Hadoop control data storage and cleanup.

External tables keep data files where you choose and only manage metadata.

Choose based on whether you want Hadoop to manage your data files or not.