Microservicessystem_design~10 mins

Centralized logging (ELK stack) in Microservices - Scalability & System Analysis

Choose your learning style10 modes available

Learn Why Deep Arch Practice Challenge Design Recall Scale

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Scalability Analysis - Centralized logging (ELK stack)

Growth Table: Centralized Logging with ELK Stack

Users / Services	Log Volume	Infrastructure Changes	Challenges
100 users / 10 services	~10K logs/day	Single ELK stack instance; basic log shipping	Minimal latency; easy to manage
10K users / 100 services	~1M logs/day	Scale Elasticsearch cluster; add Logstash nodes; use Kafka for buffering	Indexing delays; storage growth; query slowdowns
1M users / 1000 services	~100M logs/day	Multi-node Elasticsearch clusters with sharding; dedicated Kafka clusters; use Elasticsearch cross-cluster search	Storage cost; query performance; cluster management complexity
100M users / 10K services	~10B logs/day	Multiple ELK clusters per region; heavy use of data tiering and archival; advanced indexing strategies; use of cloud storage for cold data	High operational cost; data retention policies; disaster recovery

First Bottleneck

The first bottleneck is usually the Elasticsearch cluster. As log volume grows, Elasticsearch struggles with indexing speed and query latency due to disk I/O and CPU limits.

Scaling Solutions

Horizontal Scaling: Add more Elasticsearch nodes and shard indices to distribute load.
Buffering: Use Kafka or similar message queues to decouple log producers from Elasticsearch ingestion.
Caching: Use Elasticsearch query caching and Kibana dashboards caching to reduce repeated query load.
Data Tiering: Move older logs to cheaper storage tiers or cold storage to reduce hot cluster load.
Index Lifecycle Management: Automate index rollover and deletion to manage storage efficiently.
Load Balancing: Distribute incoming log traffic evenly across Logstash or Beats agents.
Compression: Compress logs during transport and storage to save bandwidth and disk space.

Back-of-Envelope Cost Analysis

At 1M logs/day (~11.5 logs/sec), Elasticsearch indexing requires ~100-200 MB/s disk throughput.
Storage needed: Assuming 1 KB per log, 1M logs/day = ~1 GB/day; 1 year = ~365 GB.
Network bandwidth: For 1M logs/day, ~1 MB/s sustained bandwidth needed for log shipping.
CPU: Elasticsearch nodes need multiple cores (8+) for indexing and query processing at medium scale.
Memory: Elasticsearch benefits from large heap sizes (16-32 GB) for caching and indexing.

Interview Tip

Start by explaining the data flow: microservices generate logs → logs are shipped via agents (Beats) → buffered by Kafka → processed by Logstash → stored in Elasticsearch → visualized in Kibana.

Discuss bottlenecks focusing on Elasticsearch indexing and query performance. Then propose scaling solutions like sharding, buffering, and data tiering. Mention cost trade-offs and operational complexity.

Self Check Question

Your Elasticsearch cluster handles 1000 queries per second (QPS). Traffic grows 10x to 10,000 QPS. What do you do first and why?

Answer: Add more Elasticsearch nodes and increase shard count to distribute indexing and query load horizontally. This prevents CPU and disk I/O bottlenecks and maintains query latency.

Key Result

Elasticsearch indexing and query performance is the first bottleneck as log volume grows; horizontal scaling with sharding and buffering with Kafka are key to scaling ELK stack for centralized logging.

Practice

(1/5)

1. What is the main purpose of the ELK stack in microservices architecture?

easy

A. To manage database transactions

B. To deploy microservices automatically

C. To collect, store, and visualize logs from multiple services in one place

D. To monitor network traffic between services

Centralized logging (ELK stack) in Microservices - Scalability & System Analysis

Start learning this pattern below

Practice

Solution

Step 1: Understand ELK stack components

Step 2: Identify ELK stack role in microservices

Final Answer:

Quick Check:

Solution

Step 1: Recall ELK stack components

Step 2: Identify correct service name in Docker Compose

Final Answer:

Quick Check:

Solution

Step 1: Analyze Logstash input configuration

Step 2: Analyze Logstash output configuration

Final Answer:

Quick Check:

Solution

Step 1: Check connectivity between Logstash and Elasticsearch

Step 2: Verify other options

Final Answer:

Quick Check:

Solution

Step 1: Setup Filebeat on microservice host

Step 2: Ensure ELK stack components are running

Final Answer:

Quick Check: