Internal vs External Table in Hive: Key Differences and Usage
internal table stores data within Hive's warehouse directory and deletes data when dropped, while an external table points to data stored outside Hive and preserves data when dropped. Internal tables manage data lifecycle fully, whereas external tables allow sharing data with other tools.Quick Comparison
This table summarizes the main differences between internal and external tables in Hive.
| Feature | Internal Table | External Table |
|---|---|---|
| Data Storage Location | Hive's warehouse directory (default) | User-specified external location |
| Data Deletion on Drop | Deletes both table and data | Deletes only table metadata, data remains |
| Use Case | Hive manages full data lifecycle | Data shared with other systems or tools |
| Creation Syntax | CREATE TABLE ... | CREATE EXTERNAL TABLE ... |
| Data Ownership | Hive owns data | User owns data |
| Data Backup | Backup needed before drop | Data safe after drop |
Key Differences
Internal tables store data inside Hive's default warehouse directory, usually at /user/hive/warehouse. When you drop an internal table, Hive deletes both the table schema and the actual data files. This means Hive fully controls the data lifecycle.
In contrast, external tables point to data stored outside Hive's warehouse, often on HDFS or other storage locations. Dropping an external table removes only the table schema from Hive, leaving the data files untouched. This allows multiple tools or users to access the same data without Hive deleting it.
Because of this, internal tables are best when Hive is the sole data manager, while external tables are ideal for sharing data or when data is managed outside Hive.
Code Comparison
Here is how you create and drop an internal table in Hive:
CREATE TABLE employees ( id INT, name STRING, salary FLOAT ); -- Insert sample data INSERT INTO TABLE employees VALUES (1, 'Alice', 50000), (2, 'Bob', 60000); -- Drop the table DROP TABLE employees;
External Table Equivalent
Here is how you create and drop an external table in Hive pointing to existing data:
CREATE EXTERNAL TABLE employees_ext ( id INT, name STRING, salary FLOAT ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/data/employees'; -- Drop the table DROP TABLE employees_ext;
When to Use Which
Choose internal tables when Hive should fully manage the data lifecycle, including storage and deletion. This is good for temporary or Hive-exclusive datasets.
Choose external tables when data is shared across multiple tools or users, or when data already exists outside Hive. External tables prevent accidental data loss by preserving data on drop.