0
0
HadoopComparisonBeginner · 4 min read

Internal vs External Table in Hive: Key Differences and Usage

In Hive, an internal table stores data within Hive's warehouse directory and deletes data when dropped, while an external table points to data stored outside Hive and preserves data when dropped. Internal tables manage data lifecycle fully, whereas external tables allow sharing data with other tools.
⚖️

Quick Comparison

This table summarizes the main differences between internal and external tables in Hive.

FeatureInternal TableExternal Table
Data Storage LocationHive's warehouse directory (default)User-specified external location
Data Deletion on DropDeletes both table and dataDeletes only table metadata, data remains
Use CaseHive manages full data lifecycleData shared with other systems or tools
Creation SyntaxCREATE TABLE ...CREATE EXTERNAL TABLE ...
Data OwnershipHive owns dataUser owns data
Data BackupBackup needed before dropData safe after drop
⚖️

Key Differences

Internal tables store data inside Hive's default warehouse directory, usually at /user/hive/warehouse. When you drop an internal table, Hive deletes both the table schema and the actual data files. This means Hive fully controls the data lifecycle.

In contrast, external tables point to data stored outside Hive's warehouse, often on HDFS or other storage locations. Dropping an external table removes only the table schema from Hive, leaving the data files untouched. This allows multiple tools or users to access the same data without Hive deleting it.

Because of this, internal tables are best when Hive is the sole data manager, while external tables are ideal for sharing data or when data is managed outside Hive.

⚖️

Code Comparison

Here is how you create and drop an internal table in Hive:

sql
CREATE TABLE employees (
  id INT,
  name STRING,
  salary FLOAT
);

-- Insert sample data
INSERT INTO TABLE employees VALUES (1, 'Alice', 50000), (2, 'Bob', 60000);

-- Drop the table
DROP TABLE employees;
Output
Table employees created. 2 rows inserted. Table employees dropped and data deleted.
↔️

External Table Equivalent

Here is how you create and drop an external table in Hive pointing to existing data:

sql
CREATE EXTERNAL TABLE employees_ext (
  id INT,
  name STRING,
  salary FLOAT
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LOCATION '/user/data/employees';

-- Drop the table
DROP TABLE employees_ext;
Output
External table employees_ext created. Table employees_ext dropped but data at /user/data/employees remains intact.
🎯

When to Use Which

Choose internal tables when Hive should fully manage the data lifecycle, including storage and deletion. This is good for temporary or Hive-exclusive datasets.

Choose external tables when data is shared across multiple tools or users, or when data already exists outside Hive. External tables prevent accidental data loss by preserving data on drop.

Key Takeaways

Internal tables store data inside Hive and delete data when dropped.
External tables link to external data and keep data after table drop.
Use internal tables for Hive-managed data lifecycle.
Use external tables to share data or protect data from deletion.
Dropping external tables only removes metadata, not data files.