Customer segmentation pattern in Data Analysis Python - Time & Space Complexity
We want to understand how the time needed to group customers grows as the number of customers increases.
How does the work change when we have more customers to segment?
Analyze the time complexity of the following code snippet.
import pandas as pd
def segment_customers(df):
segments = {}
for _, row in df.iterrows():
key = (row['age_group'], row['income_level'])
segments.setdefault(key, []).append(row['customer_id'])
return segments
This code groups customers by their age group and income level into segments.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping through each customer record once.
- How many times: Exactly once per customer, so as many times as there are customers.
As the number of customers grows, the time to process grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 operations |
| 100 | 100 operations |
| 1000 | 1000 operations |
Pattern observation: Doubling customers roughly doubles the work.
Time Complexity: O(n)
This means the time grows directly with the number of customers.
[X] Wrong: "Grouping customers by multiple features means the time grows much faster, like squared."
[OK] Correct: The code only loops once through the data, grouping happens during that single pass, so time grows linearly, not squared.
Understanding how grouping scales helps you explain data processing steps clearly and confidently in real projects.
"What if we nested loops to compare each customer with every other customer? How would the time complexity change?"