Overview - Time-series collections

What is it?

Time-series collections are special types of collections in MongoDB designed to efficiently store and query data points collected over time. Each data point typically includes a timestamp and associated measurements or events. These collections optimize storage and performance for time-based data, such as sensor readings or logs. They automatically organize data to make time-based queries faster and more efficient.

Why it matters

Without time-series collections, storing and querying large volumes of time-stamped data would be slow and costly. Traditional collections can become inefficient as data grows, making it hard to analyze trends or monitor events over time. Time-series collections solve this by structuring data for quick access and reduced storage, enabling real-time insights and better decision-making in fields like IoT, finance, and monitoring systems.

Where it fits

Before learning about time-series collections, you should understand basic MongoDB collections, documents, and indexes. After mastering time-series collections, you can explore advanced topics like data aggregation, sharding for scalability, and real-time analytics pipelines.

Mental Model

Core Idea

Time-series collections organize data by time to store and retrieve chronological data efficiently and compactly.

Think of it like...

Imagine a diary where each page is dated and contains notes for that day. Instead of mixing all notes randomly, the diary keeps entries in order by date, making it easy to find what happened on any day quickly.

┌───────────────────────────────┐
│       Time-Series Collection   │
├─────────────┬───────────────┤
│ Timestamp   │ Data Fields   │
├─────────────┼───────────────┤
│ 2024-06-01  │ Temperature: 22│
│ 2024-06-01  │ Humidity: 45  │
│ 2024-06-02  │ Temperature: 23│
│ 2024-06-02  │ Humidity: 50  │
└─────────────┴───────────────┘

Data is stored in time order, optimized for fast queries by timestamp.

Build-Up - 7 Steps

1

FoundationUnderstanding Basic MongoDB Collections

Concept: Learn what a MongoDB collection is and how documents are stored.

A MongoDB collection is like a folder that holds many documents. Each document is a set of key-value pairs, similar to a JSON object. Collections store data without a fixed schema, allowing flexibility. For example, a collection named 'sensors' might hold documents with temperature and humidity readings.

Result

You can store and retrieve documents in a collection using simple commands.

Knowing how collections and documents work is essential before exploring specialized collections like time-series.

2

FoundationWhat is Time-Series Data?

3

IntermediateCreating a Time-Series Collection in MongoDB

4

IntermediateHow MongoDB Optimizes Time-Series Storage

5

IntermediateQuerying Time-Series Collections Efficiently

6

AdvancedManaging Data Retention and Expiration

7

ExpertPerformance Trade-offs and Bucket Sizing

Under the Hood

MongoDB internally organizes time-series data into buckets, each storing multiple measurements with the same metadata and close timestamps. This reduces document overhead by storing common metadata once per bucket and compressing the time and measurement fields. The storage engine uses these buckets to speed up writes and queries by scanning fewer documents. Indexes on the timeField and metaField help quickly locate relevant buckets. Automatic expiration removes old buckets based on configured policies.

Why designed this way?

Time-series data grows rapidly and often has repetitive metadata. Storing each data point as a separate document would cause high overhead and slow queries. Bucketing reduces storage size and improves performance by grouping related data. This design balances write speed, query efficiency, and storage cost. Alternatives like storing raw documents or pre-aggregated data were less flexible or efficient.

┌───────────────────────────────┐
│       Time-Series Collection   │
├─────────────┬───────────────┤
│ Document 1  │ {timestamp, data}│
│ Document 2  │ {timestamp, data}│
│ ...        │ ...             │
└─────────────┴───────────────┘
          ↓ Bucketing
┌───────────────────────────────┐
│          Bucket Document       │
├─────────────┬───────────────┤
│ Metadata    │ sensorId: 1    │
│ Time Range │ 2024-06-01 to 2024-06-02 │
│ Measurements│ [array of data points] │
└─────────────┴───────────────┘

Buckets store many points together for efficiency.

Myth Busters - 4 Common Misconceptions

Quick: Do you think time-series collections require a fixed schema for all documents? Commit to yes or no.

Common Belief:Time-series collections need all documents to have the exact same fields and structure.

Tap to reveal reality

Quick: Do you think time-series collections automatically aggregate data for you? Commit to yes or no.

Common Belief:Time-series collections automatically summarize or aggregate data over time intervals.

Tap to reveal reality

Quick: Do you think time-series collections always improve performance regardless of data size? Commit to yes or no.

Common Belief:Using time-series collections always makes data storage and queries faster no matter the workload.

Tap to reveal reality

Quick: Do you think bucket size should always be as large as possible? Commit to yes or no.

Common Belief:Larger buckets always mean better performance and storage efficiency.

Tap to reveal reality

Expert Zone

1

Time-series collections optimize for append-only workloads; frequent updates or deletes inside buckets can degrade performance.

2

Metadata fields are indexed differently and should be chosen carefully to balance query speed and storage overhead.

3

Compression algorithms used on buckets can vary based on data types, affecting storage size and CPU usage.

When NOT to use

Avoid time-series collections when data is not primarily time-based or when data volume is low. For transactional or relational data, use regular collections or relational databases. For complex multi-dimensional data, consider specialized time-series databases like InfluxDB or TimescaleDB.

Production Patterns

In production, time-series collections are used for IoT sensor data ingestion, monitoring logs, financial tick data, and telemetry. Common patterns include setting data retention policies, sharding by metadata fields for scale, and combining with aggregation pipelines for real-time dashboards.

Connections

Data Warehousing

Builds-on

Understanding time-series collections helps grasp how data warehouses store and optimize large volumes of historical data for analysis.

Event Sourcing (Software Architecture)

Similar pattern

Both store sequences of events over time, emphasizing immutability and chronological order, aiding in auditability and replay.

Financial Candlestick Charts

Application example

Time-series collections can store raw trade data that is later aggregated into candlestick charts, showing how raw data supports complex visualizations.

Common Pitfalls

#1Storing time-series data in a regular collection without time-series options.

Wrong approach:db.sensors.insertOne({ timestamp: new Date(), sensorId: 1, temperature: 22 })

Correct approach:db.createCollection('sensors', { timeseries: { timeField: 'timestamp', metaField: 'sensorId' } }) db.sensors.insertOne({ timestamp: new Date(), sensorId: 1, temperature: 22 })

Root cause:Not using time-series collections misses internal optimizations for time-based data.

#2Querying time-series data without filtering by time.

Wrong approach:db.sensors.find({ sensorId: 1 })

Correct approach:db.sensors.find({ sensorId: 1, timestamp: { $gte: ISODate('2024-06-01') } })

Root cause:Ignoring time filters leads to scanning large data sets, causing slow queries.

#3Setting expireAfterSeconds on a regular collection instead of a time-series collection.

Wrong approach:db.createCollection('logs', { expireAfterSeconds: 3600 })

Correct approach:db.createCollection('logs', { timeseries: { timeField: 'timestamp' }, expireAfterSeconds: 3600 })

Root cause:expireAfterSeconds for automatic deletion works only with time-series collections.

Key Takeaways

Time-series collections in MongoDB are specialized collections optimized to store and query data points with timestamps efficiently.

They use internal bucketing to group related data, reducing storage overhead and improving query speed.

Creating a time-series collection requires specifying the timeField and optionally a metaField to organize data properly.

Automatic data expiration can be configured to manage storage by removing old data without manual cleanup.

Understanding bucket sizing and query patterns is essential to optimize performance for real-world time-series workloads.