Overview - Log-based metrics

What is it?

Log-based metrics are measurements created from log entries in cloud systems. They count or summarize specific events recorded in logs to help monitor and understand system behavior. Instead of relying only on predefined metrics, log-based metrics let you create custom insights from detailed logs. This helps track unique or complex events that standard metrics might miss.

Why it matters

Without log-based metrics, you would only see general system data and miss important details hidden in logs. This makes it hard to detect unusual problems or track specific user actions. Log-based metrics turn raw logs into clear numbers you can watch and alert on, improving system reliability and response time. They help teams catch issues early and understand system health deeply.

Where it fits

Before learning log-based metrics, you should understand basic logging and standard metrics in cloud monitoring. After mastering log-based metrics, you can explore alerting based on these metrics and advanced monitoring dashboards. This topic fits in the monitoring and observability part of cloud infrastructure management.

Mental Model

Core Idea

Log-based metrics transform detailed log messages into meaningful numbers that reveal system behavior and issues.

Think of it like...

Imagine logs as a diary full of detailed daily notes, and log-based metrics as the summary charts that count how many times certain events happened, like how many times you went jogging or cooked dinner.

┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│   Raw Logs    │─────▶│ Log-based     │─────▶│ Metrics       │
│ (detailed     │      │ Metrics       │      │ (counts, sums)│
│  messages)    │      │ (filters &    │      │               │
│               │      │  aggregations)│      │               │
└───────────────┘      └───────────────┘      └───────────────┘

Build-Up - 7 Steps

1

FoundationUnderstanding Logs and Their Role

Concept: Logs are detailed records of events happening in a system, capturing what occurred and when.

Logs are like a diary for your system. They record every action, error, or event with details like time, source, and message. For example, a web server log might record each page visit or error. Logs help you understand exactly what happened inside your system.

Result

You know that logs are detailed event records, but they can be very large and hard to analyze directly.

Understanding logs as detailed event records is essential because it shows why raw logs alone are too complex for quick monitoring.

2

FoundationWhat Are Metrics in Cloud Monitoring

3

IntermediateCreating Custom Metrics from Logs

4

IntermediateTypes of Log-based Metrics

5

IntermediateFiltering and Aggregating Logs for Metrics

6

AdvancedUsing Log-based Metrics for Alerting and Dashboards

7

ExpertPerformance and Cost Considerations of Log-based Metrics

Under the Hood

Log-based metrics work by continuously scanning incoming log entries and applying user-defined filters to select relevant logs. These selected logs are then aggregated over time windows using counting, summing, or distribution calculations. The results are stored as time series data, which monitoring systems query like standard metrics. This process happens in near real-time using scalable cloud infrastructure.

Why designed this way?

This design allows flexible, custom insights without changing application code or relying only on predefined metrics. It leverages existing logs, which are rich in detail, to create metrics tailored to specific monitoring needs. Alternatives like instrumenting code for every metric are costly and less flexible, so log-based metrics provide a scalable, adaptable solution.

┌───────────────┐      ┌───────────────┐      ┌───────────────┐      ┌───────────────┐
│   Incoming    │─────▶│   Log Filter  │─────▶│ Aggregation   │─────▶│ Time Series   │
│    Logs      │      │ (select logs) │      │ (count, sum)  │      │   Storage     │
└───────────────┘      └───────────────┘      └───────────────┘      └───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: do you think log-based metrics automatically include all logs without filtering? Commit to yes or no.

Common Belief:Log-based metrics use all logs by default and show complete system data.

Tap to reveal reality

Quick: do you think log-based metrics are free and have no impact on cloud costs? Commit to yes or no.

Common Belief:Creating many log-based metrics has no cost or performance impact.

Tap to reveal reality

Quick: do you think log-based metrics can replace all standard metrics? Commit to yes or no.

Common Belief:Log-based metrics can fully replace standard system metrics.

Tap to reveal reality

Quick: do you think log-based metrics can only count events, not measure values like latency? Commit to yes or no.

Common Belief:Log-based metrics only count how many times something happens.

Tap to reveal reality

Expert Zone

1

Log-based metrics latency can vary; understanding ingestion delays helps set realistic alert thresholds.

2

Complex filters with regex or multiple conditions can slow metric processing; balancing complexity and performance is key.

3

Combining log-based metrics with trace data provides richer observability but requires careful correlation strategies.

When NOT to use

Avoid log-based metrics for high-frequency, low-latency core system metrics where standard metrics are optimized. Use log-based metrics mainly for custom, event-driven insights not covered by standard metrics.

Production Patterns

In production, teams create log-based metrics for error tracking, user behavior counts, and custom SLA monitoring. They integrate these metrics into alerting policies and dashboards, often combining them with standard metrics for comprehensive observability.

Connections

Event-driven architecture

Builds-on

Log-based metrics transform event logs into measurable data, similar to how event-driven systems react to events; understanding one helps grasp how systems respond to and measure events.

Data aggregation in databases

Same pattern

Both log-based metrics and database aggregation summarize large data sets into meaningful summaries, showing a common pattern of filtering and summarizing data for insight.

Human memory summarization

Analogy in cognition

Just as humans remember key points from many details, log-based metrics summarize vast logs into key numbers, illustrating how summarization aids understanding across fields.

Common Pitfalls

#1Creating too many detailed log-based metrics without filtering.

Wrong approach:Create a log-based metric that counts every log entry without any filter.

Correct approach:Create log-based metrics with specific filters targeting relevant log entries only.

Root cause:Misunderstanding that filtering is necessary to focus metrics and avoid overload.

#2Expecting instant availability of log-based metrics after creation.

Wrong approach:Immediately setting alerts on a newly created log-based metric without waiting for data to accumulate.

Correct approach:Wait for sufficient log data to be processed before relying on the metric for alerts.

Root cause:Not realizing log processing and metric aggregation have ingestion delays.

#3Using log-based metrics for high-frequency system metrics like CPU usage.

Wrong approach:Replacing standard CPU usage metrics with log-based metrics counting CPU logs.

Correct approach:Use standard metrics for CPU usage and reserve log-based metrics for custom event counts.

Root cause:Confusing the purpose and efficiency of standard vs. log-based metrics.

Key Takeaways

Log-based metrics turn detailed logs into meaningful numbers that reveal system events and behaviors.

They allow custom monitoring beyond standard metrics by filtering and aggregating specific log entries.

Choosing the right metric type and filters is crucial for effective and efficient monitoring.

Log-based metrics integrate with alerting and dashboards to enable proactive system management.

Understanding their cost and performance impact helps design scalable, cost-effective monitoring solutions.