Kubernetesdevops~15 mins

Centralized logging (EFK stack) in Kubernetes - Deep Dive

Choose your learning style10 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Start learning this pattern below

Jump into concepts and practice - no test required

Recommended

Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong

Overview - Centralized logging (EFK stack)

What is it?

Centralized logging with the EFK stack means collecting all logs from many computers or containers into one place. EFK stands for Elasticsearch, Fluentd, and Kibana. Elasticsearch stores and searches logs, Fluentd gathers and sends logs, and Kibana shows logs in a friendly way. This helps teams see and understand what is happening across their whole system easily.

Why it matters

Without centralized logging, logs are scattered across many machines or containers, making it hard to find problems or understand system behavior. This wastes time and can delay fixing issues. Centralized logging with EFK lets teams quickly search and analyze logs from everywhere, improving reliability and speed of troubleshooting. It also helps with security and compliance by keeping logs safe and organized.

Where it fits

Before learning EFK, you should understand basic Kubernetes concepts like pods and containers, and know what logs are. After mastering EFK, you can explore advanced monitoring tools, alerting systems, and log analysis techniques to improve system health and performance.

Mental Model

Core Idea

Centralized logging with EFK collects logs from many sources, stores them efficiently, and presents them visually to make troubleshooting simple and fast.

Think of it like...

Imagine a big office building where every room has a notebook logging what happens inside. Instead of checking each room's notebook, a helper collects all notes into one big book in the lobby, organizes them, and shows summaries on a screen for everyone to see.

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│  Kubernetes │     │   Fluentd   │     │ Elasticsearch│
│  Containers │───▶ │ (Log Agent) │───▶ │ (Log Store) │
└─────────────┘     └─────────────┘     └─────────────┘
                                         │
                                         ▼
                                  ┌─────────────┐
                                  │   Kibana    │
                                  │ (Log Viewer)│
                                  └─────────────┘

Build-Up - 7 Steps

FoundationWhat is logging and why it matters

Concept: Introduce the idea of logs as records of events in software systems.

Logs are like diaries for computers and applications. They record what happened, when, and sometimes why. For example, a web server logs every request it gets. These logs help us understand if things are working or if there are problems.

Result

You understand that logs are essential for knowing what your system is doing and for finding problems.

Knowing that logs are the system’s memory helps you see why collecting and reading them is crucial for managing software.

FoundationChallenges of scattered logs in Kubernetes

IntermediateRole of Fluentd as log collector

IntermediateElasticsearch stores and indexes logs

IntermediateKibana visualizes and explores logs

AdvancedDeploying EFK stack on Kubernetes

ExpertHandling log volume and retention in EFK

Under the Hood

Fluentd runs as an agent on each Kubernetes node, reading container logs from files or stdout. It buffers and processes logs, then sends them via HTTP or TCP to Elasticsearch. Elasticsearch stores logs in indexes, which are optimized data structures for fast search. Kibana queries Elasticsearch and renders logs in dashboards using web technologies. The system uses APIs and persistent storage to ensure logs are durable and searchable.

Why designed this way?

EFK was designed to handle large, distributed systems where logs come from many sources. Fluentd’s pluggable architecture allows flexible log collection and transformation. Elasticsearch’s indexing enables fast search over huge data. Kibana provides a user-friendly way to explore logs without coding. Alternatives like direct log storage or manual collection were too slow or complex for modern cloud-native environments.

┌─────────────┐       ┌─────────────┐       ┌─────────────┐       ┌─────────────┐
│ Kubernetes  │       │  Fluentd    │       │ Elasticsearch│       │   Kibana    │
│ Containers  │──────▶│ (Collector) │──────▶│ (Indexer &   │──────▶│ (Visualizer)│
│ (Log Source)│       │             │       │  Storage)   │       │             │
└─────────────┘       └─────────────┘       └─────────────┘       └─────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does Fluentd store logs permanently or just forward them? Commit to your answer.

Common Belief:Fluentd stores all logs permanently on each node.

Tap to reveal reality

Quick: Is Kibana a log storage system or just a visualization tool? Commit to your answer.

Common Belief:Kibana stores logs and manages their retention.

Tap to reveal reality

Quick: Can you keep all logs forever without issues? Commit to your answer.

Common Belief:Storing all logs forever is always best for troubleshooting.

Tap to reveal reality

Quick: Does centralized logging solve all monitoring problems alone? Commit to your answer.

Common Belief:Centralized logging replaces the need for other monitoring tools.

Tap to reveal reality

Expert Zone

Fluentd’s buffering and retry mechanisms prevent log loss during network or Elasticsearch outages, but misconfiguration can cause delays or duplicates.

Elasticsearch index sharding and replication settings greatly affect performance and fault tolerance; tuning these is key for large clusters.

Kibana’s saved searches and dashboards can be shared and versioned, enabling team collaboration and consistent incident response.

When NOT to use

EFK is not ideal for extremely high log volumes without scaling Elasticsearch properly; alternatives like Loki or Splunk may be better. For simple or small setups, lightweight log collectors or cloud-managed logging services might be easier.

Production Patterns

In production, teams deploy Fluentd as a DaemonSet with custom filters to reduce noise. Elasticsearch uses index lifecycle management to automate retention. Kibana dashboards are customized per team needs. Logs are secured with role-based access control and encrypted transport.

Connections

Distributed Tracing

Complementary tool for understanding request flows alongside logs

Knowing centralized logging helps you appreciate how tracing fills gaps by showing how requests move through services, not just what happened.

Database Indexing

Similar concept of indexing data for fast search

Understanding Elasticsearch indexing is easier if you know how databases create indexes to speed up queries.

Library Cataloging Systems

Both organize large amounts of information for quick retrieval

Centralized logging’s indexing and search is like a library catalog helping find books quickly among thousands.

Common Pitfalls

#1Not configuring Fluentd to handle log rotation causes missing logs.

Wrong approach:fluentd.conf without proper file input settings: @type tail path /var/log/containers/*.log pos_file /var/log/fluentd.pos tag kubernetes.*

Correct approach:fluentd.conf with log rotation handling: @type tail path /var/log/containers/*.log pos_file /var/log/fluentd.pos tag kubernetes.* read_from_head true refresh_interval 5

Root cause:Ignoring how container logs rotate leads Fluentd to miss new logs or read old ones repeatedly.

#2Storing all logs forever without retention causes Elasticsearch to slow down and run out of disk.

Wrong approach:No index lifecycle management configured; all indexes kept indefinitely.

Correct approach:Set index lifecycle policy to delete or archive logs older than 30 days.

Root cause:Not planning log retention leads to resource exhaustion and degraded cluster performance.

#3Running Kibana without securing access exposes sensitive logs to anyone.

Wrong approach:Kibana deployed with default settings and no authentication.

Correct approach:Enable authentication and role-based access control in Kibana and Elasticsearch.

Root cause:Overlooking security risks exposes logs containing sensitive information.

Key Takeaways

Centralized logging with the EFK stack collects logs from many containers into one searchable place.

Fluentd gathers and forwards logs, Elasticsearch stores and indexes them, and Kibana visualizes logs for easy analysis.

Without centralized logging, troubleshooting distributed systems is slow and error-prone.

Proper deployment and management of EFK components ensure reliable, scalable, and secure logging.

Managing log volume and retention is essential to keep the system performant and cost-effective.

Practice

(1/5)

1. What is the main purpose of the EFK stack in Kubernetes?

easy

A. To collect, store, and visualize logs from all pods centrally

B. To manage Kubernetes cluster networking

C. To automate deployment of applications

D. To monitor CPU and memory usage only

Centralized logging (EFK stack) in Kubernetes - Deep Dive

Start learning this pattern below

Practice

Solution

Step 1: Understand EFK components

Step 2: Identify the main goal

Final Answer:

Quick Check:

Solution

Step 1: Understand Fluentd deployment needs

Step 2: Choose correct Kubernetes resource

Final Answer:

Quick Check:

Solution

Step 1: Analyze Fluentd match directive

Step 2: Understand output plugin settings

Final Answer:

Quick Check:

Solution

Step 1: Check Fluentd status

Step 2: Verify Elasticsearch connectivity

Step 3: Confirm Kibana configuration

Final Answer:

Quick Check:

Solution

Step 1: Understand filtering with Fluentd grep plugin

Step 2: Identify namespaces to exclude

Step 3: Compare options

Final Answer:

Quick Check: