Bucket pattern for time-series data in MongoDB - Time & Space Complexity
When working with time-series data in MongoDB, we often group many data points into buckets to store them efficiently.
We want to understand how the time to insert or query data grows as we add more data points.
Analyze the time complexity of inserting time-series data using the bucket pattern.
// Each bucket holds multiple measurements
const bucket = {
_id: ObjectId(),
sensorId: 'sensor1',
start: new Date('2024-01-01T00:00:00Z'),
measurements: [
{ timestamp: new Date('2024-01-01T00:00:01Z'), value: 10 },
{ timestamp: new Date('2024-01-01T00:00:02Z'), value: 12 },
// ... more measurements
]
};
db.timeSeriesBuckets.insertOne(bucket);
This code inserts a bucket document containing many measurements grouped by time.
Look for repeated work inside the bucket pattern.
- Primary operation: Inserting or updating a bucket with multiple measurements inside an array.
- How many times: The number of measurements per bucket determines how many data points are handled together.
As the number of measurements per bucket grows, the work to insert or update that bucket grows roughly in proportion.
| Input Size (measurements per bucket) | Approx. Operations |
|---|---|
| 10 | 10 operations to add measurements |
| 100 | 100 operations to add measurements |
| 1000 | 1000 operations to add measurements |
Pattern observation: The time grows linearly with the number of measurements in each bucket.
Time Complexity: O(n)
This means the time to insert or update a bucket grows directly with the number of measurements inside it.
[X] Wrong: "Adding more measurements to a bucket does not affect insertion time because it's one document."
[OK] Correct: Even though it's one document, MongoDB must process each measurement inside the array, so more measurements mean more work.
Understanding how grouping data affects performance helps you design efficient time-series storage and answer questions about scaling data operations.
"What if we split measurements into smaller buckets with fewer data points each? How would that change the time complexity for inserts?"