0
0
Hadoopdata~10 mins

External vs managed tables in Hadoop - Visual Side-by-Side Comparison

Choose your learning style9 modes available
Concept Flow - External vs managed tables
Create Table Command
Specify Table Type?
Managed
Data stored
in warehouse
Drop table
Data deleted
Shows the decision flow when creating a table: managed tables store data inside warehouse and delete data on drop; external tables point to external data and keep data after drop.
Execution Sample
Hadoop
CREATE TABLE sales_managed (id INT, amount FLOAT);
CREATE EXTERNAL TABLE sales_external (id INT, amount FLOAT)
LOCATION '/data/sales';
DROP TABLE sales_managed;
DROP TABLE sales_external;
Creates a managed and an external table, then drops both to show difference in data deletion.
Execution Table
StepCommandTable TypeData LocationAction on DropResult
1CREATE TABLE sales_managed (id INT, amount FLOAT);ManagedWarehouse defaultDelete dataTable created, data stored in warehouse
2CREATE EXTERNAL TABLE sales_external (id INT, amount FLOAT) LOCATION '/data/sales';External/data/salesKeep dataTable created, data location external
3DROP TABLE sales_managed;ManagedWarehouse defaultDelete dataTable dropped, data deleted from warehouse
4DROP TABLE sales_external;External/data/salesKeep dataTable dropped, data remains at external location
💡 All commands executed; difference in data deletion on drop shown
Variable Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4
sales_managed_tableNoneExists with data in warehouseExists with data in warehouseDropped, data deletedDropped
sales_external_tableNoneNoneExists pointing to /data/salesExists pointing to /data/salesDropped, data remains at /data/sales
Key Moments - 2 Insights
Why does dropping a managed table delete the data but dropping an external table does not?
Because managed tables store data inside the warehouse and control it fully, so dropping deletes data (see Step 3). External tables only reference data outside, so dropping removes the table metadata but leaves data intact (see Step 4).
What happens if you create an external table without specifying LOCATION?
The system will throw an error or treat it as managed because external tables require a data location outside the warehouse (see Step 2 for correct syntax).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution table, what happens to the data of sales_managed after DROP TABLE?
AData remains in warehouse
BData moves to external location
CData is deleted from warehouse
DData is archived automatically
💡 Hint
Check Step 3 in execution_table under 'Result' and 'Action on Drop'
At which step does the external table get created with a specified data location?
AStep 1
BStep 2
CStep 3
DStep 4
💡 Hint
Look at Step 2 command and 'Data Location' column
If you drop the external table, what happens to the data at '/data/sales/'?
AData remains untouched
BData is moved to warehouse
CData is deleted
DData is backed up automatically
💡 Hint
See Step 4 'Result' and 'Action on Drop' columns
Concept Snapshot
External vs Managed Tables in Hadoop:
- Managed tables store data inside warehouse.
- External tables point to data outside warehouse.
- Dropping managed table deletes data.
- Dropping external table keeps data intact.
- External tables require LOCATION clause.
Full Transcript
This visual execution shows how managed and external tables behave differently in Hadoop. When creating a managed table, data is stored inside the warehouse. Dropping this table deletes both metadata and data. For external tables, data lives outside the warehouse at a specified location. Dropping the external table removes only metadata, leaving data untouched. This difference is important for managing data lifecycle and storage control.