Overview - Bucket pattern for time-series data

What is it?

The bucket pattern is a way to store time-series data by grouping multiple data points into a single document called a bucket. Instead of saving each measurement separately, many measurements close in time are stored together. This helps reduce the number of documents and improves query speed for time-based data.

Why it matters

Time-series data can grow very fast, with thousands or millions of measurements per second. Without an efficient way to store and query this data, databases become slow and expensive. The bucket pattern solves this by organizing data into manageable chunks, making storage and retrieval faster and cheaper. Without it, analyzing trends or monitoring systems in real time would be much harder.

Where it fits

Before learning the bucket pattern, you should understand basic MongoDB document structure and how time-series data is usually stored. After mastering this pattern, you can explore MongoDB's native time-series collections and advanced aggregation techniques for analytics.

Mental Model

Core Idea

Group many time-stamped measurements into one document to store and query time-series data efficiently.

Think of it like...

Imagine a photo album where instead of keeping every single photo loose, you put many photos from the same event into one album page. This makes it easier to find and handle photos from that event all at once.

┌───────────────┐
│   Bucket Doc  │
│ ┌───────────┐ │
│ │ timestamps│ │
│ │ [t1,t2..] │ │
│ ├───────────┤ │
│ │ values    │ │
│ │ [v1,v2..] │ │
│ └───────────┘ │
└───────────────┘
Each bucket holds arrays of timestamps and values grouped together.

Build-Up - 7 Steps

1

FoundationUnderstanding time-series data basics

Concept: Time-series data is a sequence of data points recorded over time, usually with timestamps.

Time-series data records things like temperature every minute or stock prices every second. Each data point has a timestamp and a value. Storing each point separately can create many small records.

Result

You know what time-series data looks like and why it can be large and fast-growing.

Understanding the nature of time-series data helps explain why special storage methods like bucketing are needed.

2

FoundationMongoDB document and collection basics

3

IntermediateIntroducing the bucket pattern concept

4

IntermediateChoosing bucket size and time range

5

IntermediateQuerying data stored with buckets

6

AdvancedHandling updates and inserts in buckets

7

ExpertOptimizing bucket pattern for production use

Under the Hood

Internally, MongoDB stores each bucket as a single BSON document containing arrays of timestamps and values. When querying, the database reads these arrays and uses aggregation operators like $unwind to process individual points. Updates modify arrays atomically to maintain consistency. Buckets reduce document overhead and improve disk and memory usage by grouping related data.

Why designed this way?

The bucket pattern was designed to address the inefficiency of storing millions of tiny documents for time-series data. Grouping points reduces index size and improves write throughput. Alternatives like one document per point caused high storage and query costs. Bucketing balances granularity with performance.

┌───────────────┐
│  Client Query │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│  MongoDB      │
│  Collection   │
│  ┌─────────┐  │
│  │ Buckets │  │
│  │ [Doc1]  │  │
│  │ [Doc2]  │  │
│  └─────────┘  │
└──────┬────────┘
       │
       ▼
┌───────────────┐
│ Aggregation   │
│ Pipeline:     │
│ $match, $unwind│
│ $filter       │
└───────────────┘
       │
       ▼
┌───────────────┐
│  Result Set   │
│  Individual   │
│  Data Points  │
└───────────────┘

Myth Busters - 4 Common Misconceptions

Quick: Does storing more points in one bucket always make queries faster? Commit yes or no.

Common Belief:More points in a bucket always improve query speed because fewer documents are read.

Tap to reveal reality

Quick: Is the bucket pattern only useful for MongoDB? Commit yes or no.

Common Belief:The bucket pattern is a MongoDB-specific trick and not applicable elsewhere.

Tap to reveal reality

Quick: Can you update a single data point inside a bucket as easily as a separate document? Commit yes or no.

Common Belief:Updating one point in a bucket is as simple as updating a single document.

Tap to reveal reality

Quick: Does the bucket pattern eliminate the need for indexes on time-series data? Commit yes or no.

Common Belief:Because data is grouped, indexes on time fields are unnecessary.

Tap to reveal reality

Expert Zone

1

Buckets often include metadata like min/max timestamps to speed up range queries without unpacking all data.

2

Compression of bucket contents can greatly reduce storage but requires balancing CPU cost during reads and writes.

3

Sharding buckets by time or device ID helps scale writes and queries horizontally in large deployments.

When NOT to use

The bucket pattern is not ideal when you need frequent updates or deletes of individual points, or when data points are very sparse. In such cases, storing points as individual documents or using specialized time-series databases with native support may be better.

Production Patterns

In production, the bucket pattern is combined with TTL indexes to expire old data, pre-aggregations for fast analytics, and monitoring tools to track bucket sizes and query performance. It is often used in IoT, monitoring, and financial systems.

Connections

Data Compression

Builds-on

Understanding how bucketed data can be compressed helps optimize storage and speed for large time-series datasets.

Batch Processing

Same pattern

Both bucket pattern and batch processing group many small items to improve efficiency and reduce overhead.

Human Memory Chunking

Analogous cognitive process

Just like chunking helps humans remember information better by grouping, the bucket pattern groups data points to manage complexity and improve retrieval.

Common Pitfalls

#1Creating buckets that are too large causing slow queries and high memory use.

Wrong approach:Store 10,000 points per bucket without limits: { timestamps: [...10000 timestamps...], values: [...10000 values...] }

Correct approach:Limit buckets to a few hundred points or a fixed time range: { timestamps: [...300 timestamps...], values: [...300 values...] }

Root cause:Misunderstanding the trade-off between bucket size and query performance.

#2Not indexing bucket metadata leading to slow time-range queries.

Wrong approach:No indexes on bucket start or end time fields.

Correct approach:Create indexes on bucket metadata fields like startTime: db.collection.createIndex({ startTime: 1 })

Root cause:Assuming bucketing alone speeds queries without supporting indexes.

#3Trying to update individual points inside buckets with simple update commands.

Wrong approach:db.collection.updateOne({ _id: bucketId }, { $set: { 'values.5': 42 } })

Correct approach:Use atomic array update operators carefully or rewrite buckets: Use $set with positional filters or replace entire bucket document.

Root cause:Not realizing array updates inside documents require special handling.

Key Takeaways

The bucket pattern groups many time-series points into one document to improve storage and query efficiency.

Choosing the right bucket size balances performance and resource use; too big or too small buckets cause problems.

Querying bucketed data requires unpacking arrays, often using aggregation pipelines in MongoDB.

Updates inside buckets are more complex than single document updates and need careful handling.

In production, the bucket pattern is combined with indexing, compression, and sharding for scalable time-series management.