0
0
Microservicessystem_design~15 mins

Centralized logging (ELK stack) in Microservices - Deep Dive

Choose your learning style9 modes available
Overview - Centralized logging (ELK stack)
What is it?
Centralized logging with the ELK stack means collecting logs from many services into one place. ELK stands for Elasticsearch, Logstash, and Kibana. Elasticsearch stores and searches logs, Logstash collects and processes them, and Kibana shows them in easy-to-understand dashboards. This helps teams see what is happening across all parts of a system quickly.
Why it matters
Without centralized logging, developers and operators must check logs on each server or service separately, which is slow and error-prone. Problems can go unnoticed or take too long to fix. Centralized logging makes troubleshooting faster, improves system reliability, and helps understand user behavior across many services.
Where it fits
Before learning centralized logging, you should understand basic logging and microservices architecture. After this, you can explore alerting systems, monitoring tools, and distributed tracing to get a full picture of system health and performance.
Mental Model
Core Idea
Centralized logging collects all logs from many services into one searchable place to simplify monitoring and troubleshooting.
Think of it like...
Imagine a large office building where every room has a notebook for notes. Instead of checking each notebook separately, all notes are copied into one big book at the front desk, so anyone can quickly find what they need.
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│ Microservice 1│─────▶│               │      │               │
│ Microservice 2│─────▶│   Logstash    │─────▶│ Elasticsearch │
│ Microservice 3│─────▶│ (Collector &  │      │ (Storage &    │
│ ...           │      │  Processor)   │      │  Search)      │
└───────────────┘      └───────────────┘      └───────────────┘
                                               │
                                               ▼
                                         ┌───────────┐
                                         │ Kibana UI │
                                         │ (Dashboard│
                                         │  & Search)│
                                         └───────────┘
Build-Up - 7 Steps
1
FoundationWhat is logging and why it matters
🤔
Concept: Logging means recording events or messages from software to understand what happened.
Every software writes logs to record actions, errors, or important events. Logs help developers see what the software did and find problems. Without logs, fixing bugs or understanding system behavior is very hard.
Result
You understand that logs are essential records that tell the story of software activity.
Knowing that logs are the primary source of truth for software behavior sets the stage for why collecting and managing them matters.
2
FoundationChallenges of logging in microservices
🤔
Concept: Microservices split an application into many small services, each with its own logs.
In microservices, each service runs separately and writes logs independently. This means logs are scattered across many places. Finding related logs for a single user request or error requires checking multiple services, which is slow and confusing.
Result
You see why scattered logs make troubleshooting complex and inefficient.
Understanding the distributed nature of logs in microservices highlights the need for a unified logging approach.
3
IntermediateCentralized logging concept and benefits
🤔Before reading on: do you think centralized logging stores logs on each service or in one place? Commit to your answer.
Concept: Centralized logging gathers all logs from different services into one system for easy search and analysis.
Instead of checking logs on each server, centralized logging sends all logs to a central system. This system indexes logs so you can search by time, service, or error type. It also allows creating dashboards to monitor system health.
Result
You understand how centralized logging simplifies finding and analyzing logs across many services.
Knowing that centralizing logs reduces time to detect and fix issues shows why it is a key practice in modern systems.
4
IntermediateELK stack components and roles
🤔Before reading on: which ELK component do you think stores logs, and which one shows them? Commit to your answer.
Concept: ELK stack is a popular set of tools for centralized logging: Elasticsearch stores logs, Logstash collects and processes them, and Kibana visualizes logs.
Logstash receives logs from services, transforms or filters them, and sends them to Elasticsearch. Elasticsearch stores logs in a way that makes searching fast. Kibana connects to Elasticsearch and shows logs in dashboards and search interfaces.
Result
You know the purpose of each ELK component and how they work together.
Understanding each tool's role helps you design and troubleshoot centralized logging systems effectively.
5
IntermediateLog shipping and parsing with Logstash
🤔Before reading on: do you think Logstash only forwards logs or also changes them? Commit to your answer.
Concept: Logstash can modify logs by parsing, filtering, or enriching them before storing.
Logstash can parse raw logs to extract fields like timestamps or error codes. It can also drop unneeded logs or add extra info like service names. This makes logs easier to search and analyze later.
Result
You see how preprocessing logs improves their usefulness and reduces noise.
Knowing that logs can be transformed before storage helps maintain clean, searchable data and reduces storage costs.
6
AdvancedScaling ELK for high volume logs
🤔Before reading on: do you think a single Elasticsearch node can handle millions of logs per day? Commit to your answer.
Concept: ELK stack can be scaled horizontally to handle large log volumes by adding nodes and partitioning data.
Elasticsearch clusters have multiple nodes that share data and queries. Logs are split into shards across nodes. Logstash can run on many servers to collect logs in parallel. Kibana connects to the cluster to query data efficiently.
Result
You understand how ELK can grow with system size and log volume.
Knowing ELK's distributed design prevents bottlenecks and ensures reliable log storage and search at scale.
7
ExpertHandling log retention and cost tradeoffs
🤔Before reading on: do you think keeping all logs forever is always best? Commit to your answer.
Concept: Log retention policies balance storage cost and the need to keep logs for troubleshooting or compliance.
Storing logs indefinitely is expensive and slows queries. Teams set retention times (e.g., 30 days) and archive or delete older logs. Some logs are compressed or moved to cheaper storage. Choosing retention depends on business needs and regulations.
Result
You appreciate the tradeoffs between log availability and cost.
Understanding retention policies helps design sustainable logging systems that meet operational and legal requirements.
Under the Hood
Logstash agents run on servers or as centralized collectors to receive logs via protocols like Beats or syslog. They parse and transform logs using configurable pipelines. Logs are sent to Elasticsearch, which stores them in indexes split into shards across nodes. Elasticsearch uses inverted indexes for fast full-text search. Kibana queries Elasticsearch via APIs to display logs in dashboards and search views.
Why designed this way?
ELK was designed to handle large, diverse log data efficiently. Elasticsearch's distributed search allows scaling. Logstash's flexible pipelines support many log formats and transformations. Kibana provides user-friendly visualization without coding. Alternatives like Splunk are proprietary and costly, while ELK is open-source and customizable.
┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│ Log Sources   │─────▶│ Logstash      │─────▶│ Elasticsearch │
│ (Apps, Beats) │      │ (Parsing &    │      │ (Distributed  │
│               │      │  Filtering)   │      │  Storage &    │
└───────────────┘      └───────────────┘      │  Search)      │
                                               └───────────────┘
                                                      │
                                                      ▼
                                               ┌───────────┐
                                               │ Kibana UI │
                                               │ (Search & │
                                               │  Visuals) │
                                               └───────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does centralized logging mean logs are stored only temporarily? Commit to yes or no.
Common Belief:Centralized logging just collects logs temporarily and then deletes them quickly.
Tap to reveal reality
Reality:Centralized logging stores logs persistently for days or months to allow historical analysis and troubleshooting.
Why it matters:If logs are deleted too soon, teams lose the ability to investigate past incidents or meet compliance requirements.
Quick: Do you think Logstash only forwards logs without changing them? Commit to yes or no.
Common Belief:Logstash is just a simple forwarder that sends logs as-is to storage.
Tap to reveal reality
Reality:Logstash can parse, filter, and enrich logs before sending, improving log quality and searchability.
Why it matters:Ignoring Logstash's processing power leads to messy logs that are hard to analyze and increases storage costs.
Quick: Can a single Elasticsearch node handle all logs for a large system? Commit to yes or no.
Common Belief:One Elasticsearch server is enough for any log volume.
Tap to reveal reality
Reality:Large systems require Elasticsearch clusters with multiple nodes to distribute data and queries for performance and reliability.
Why it matters:Using a single node causes slow searches, data loss risk, and system crashes under heavy load.
Quick: Is it best to keep all logs forever? Commit to yes or no.
Common Belief:Keeping all logs forever is always best for troubleshooting.
Tap to reveal reality
Reality:Storing all logs indefinitely is costly and slows down queries; retention policies balance cost and usefulness.
Why it matters:Without retention policies, logging systems become expensive and inefficient, hurting operations.
Expert Zone
1
Logstash pipelines can be optimized with conditionals and multiple filters to reduce processing time and storage needs.
2
Elasticsearch index lifecycle management automates rollover, retention, and deletion to maintain cluster health without manual intervention.
3
Kibana supports alerting and machine learning features that detect anomalies in logs automatically, going beyond simple search.
When NOT to use
Centralized logging with ELK is not ideal for extremely high-throughput systems without proper scaling or for logs requiring strict real-time processing. Alternatives like Kafka-based pipelines or specialized log analytics platforms may be better for those cases.
Production Patterns
In production, ELK is often combined with Beats agents for lightweight log shipping, secured with TLS and authentication. Logs are tagged with metadata like environment and service name. Index templates and ILM policies automate data management. Teams build dashboards for error rates, latency, and user activity to monitor system health.
Connections
Distributed Tracing
Builds-on
Centralized logging provides raw event data, while distributed tracing adds context by linking logs across services for a full request journey.
Data Warehousing
Similar pattern
Both centralize data from many sources for analysis, but logging focuses on time-series event data, while warehousing handles structured business data.
Library Cataloging Systems
Analogous process
Just like libraries index books for quick search, Elasticsearch indexes logs to enable fast retrieval across vast data.
Common Pitfalls
#1Sending raw logs without parsing or filtering.
Wrong approach:Logstash configuration: input { beats { port => 5044 } } output { elasticsearch { hosts => ["localhost:9200"] } }
Correct approach:Logstash configuration: input { beats { port => 5044 } } filter { grok { match => { "message" => "%{COMMONAPACHELOG}" } } } output { elasticsearch { hosts => ["localhost:9200"] } }
Root cause:Not using filters leads to unstructured logs that are hard to search and analyze.
#2Using a single Elasticsearch node for large log volumes.
Wrong approach:Deploying Elasticsearch on one server without clustering or sharding.
Correct approach:Deploying Elasticsearch as a cluster with multiple nodes and configured shards and replicas.
Root cause:Underestimating log volume and query load causes performance bottlenecks and data loss risk.
#3Keeping all logs forever without retention policy.
Wrong approach:No index lifecycle management; all logs stored indefinitely.
Correct approach:Implementing ILM policies to delete or archive logs after a set period, e.g., 30 days.
Root cause:Ignoring storage costs and query performance degradation over time.
Key Takeaways
Centralized logging collects logs from many services into one place to simplify monitoring and troubleshooting.
The ELK stack uses Logstash to collect and process logs, Elasticsearch to store and search them, and Kibana to visualize logs.
Parsing and filtering logs before storage improves searchability and reduces noise and storage costs.
Scaling ELK with clusters and retention policies ensures performance and cost-effectiveness for large systems.
Understanding centralized logging is essential for managing complex microservices and maintaining system reliability.