0
0
HLDsystem_design~12 mins

Data warehouse vs data lake in HLD - Architecture Patterns Compared

Choose your learning style9 modes available
System Overview - Data warehouse vs data lake

This system compares two popular data storage solutions: data warehouses and data lakes. The goal is to understand their architecture, how data flows through them, and their key differences in handling data for analytics and business intelligence.

Architecture Diagram
User
  |
  v
Load Balancer
  |
  v
API Gateway
  |
  +---------------------+---------------------+
  |                     |                     |
Data Warehouse       Data Lake             Metadata Store
  |                     |                     |
  v                     v                     v
Relational DB        Object Storage       Catalog DB
  |                     |                     |
  +---------------------+---------------------+
                        |
                        v
                    Analytics Engine
                        |
                        v
                    Visualization
Components
User
user
End user or analyst requesting data or reports
Load Balancer
load_balancer
Distributes incoming requests evenly to API Gateway instances
API Gateway
api_gateway
Routes requests to either Data Warehouse or Data Lake services
Data Warehouse
service
Structured storage optimized for fast SQL queries and business reporting
Data Lake
service
Stores raw, unstructured or semi-structured data in large volumes
Metadata Store
service
Manages metadata and schema information for data lake contents
Relational DB
database
Stores cleaned, structured data in tables for the data warehouse
Object Storage
storage
Stores raw files like logs, images, and JSON in the data lake
Catalog DB
database
Stores metadata and schema details for data lake files
Analytics Engine
service
Processes data from warehouse or lake for analysis and machine learning
Visualization
service
Generates reports and dashboards for users
Request Flow - 14 Hops
UserLoad Balancer
Load BalancerAPI Gateway
API GatewayData Warehouse
Data WarehouseRelational DB
Relational DBData Warehouse
Data WarehouseAPI Gateway
API GatewayVisualization
VisualizationUser
API GatewayData Lake
Data LakeMetadata Store
Data LakeObject Storage
Data LakeAPI Gateway
API GatewayAnalytics Engine
Analytics EngineVisualization
Failure Scenario
Component Fails:Relational DB
Impact:Data Warehouse queries fail, reports cannot be generated from structured data
Mitigation:Use database replication and failover to standby DB; cache recent query results for read availability
Architecture Quiz - 3 Questions
Test your understanding
Which component stores raw, unstructured data in this architecture?
ARelational DB
BObject Storage
CMetadata Store
DData Warehouse
Design Principle
This architecture highlights the difference between structured data storage optimized for fast queries (Data Warehouse) and flexible storage for raw data (Data Lake). It uses an API Gateway to route requests appropriately and metadata management to handle schema in the data lake, enabling scalable and efficient analytics.