Parallel scan in DynamoDB - Time & Space Complexity
When we scan a large DynamoDB table, the time it takes depends on how many items we check. Parallel scan splits this work into parts to do at the same time.
We want to understand how the total work changes when we scan with multiple parts running together.
Analyze the time complexity of the following DynamoDB parallel scan code.
const params = {
TableName: "MyTable",
Segment: segmentNumber,
TotalSegments: totalSegments
};
const data = await dynamodb.scan(params).promise();
// Repeat for each segment in parallel
This code scans one segment of the table. Multiple segments are scanned at the same time to cover the whole table faster.
Look at what repeats when scanning in parallel.
- Primary operation: Scanning each segment of the table.
- How many times: Once per segment, all running at the same time.
As the table size grows, each segment has more items to scan, so each scan takes longer.
| Input Size (n) | Approx. Operations per Segment |
|---|---|
| 10 | 10 / totalSegments |
| 100 | 100 / totalSegments |
| 1000 | 1000 / totalSegments |
Pattern observation: The total work is split evenly, so each part does less work as segments increase.
Time Complexity: O(n / p)
This means the time to scan each segment decreases as we increase the number of parallel segments, dividing the total work.
[X] Wrong: "Parallel scan always makes scanning instant no matter how big the table is."
[OK] Correct: Even with parallel scan, the total work depends on the table size. More segments help, but each segment still scans part of the table, so time reduces but does not disappear.
Understanding how parallel scan splits work helps you explain how to handle big data efficiently. It shows you can think about dividing tasks to save time, a useful skill in many real projects.
"What if we increase the number of segments beyond the number of CPU cores available? How would that affect the time complexity?"