Sales data analysis pattern in Data Analysis Python - Time & Space Complexity
When analyzing sales data, we often run patterns to summarize or find trends. Understanding how long these patterns take helps us work efficiently.
We want to know how the time to analyze grows as the sales data gets bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
def total_sales_per_product(sales_df):
result = {}
for product in sales_df['product'].unique():
total = sales_df[sales_df['product'] == product]['amount'].sum()
result[product] = total
return result
This code calculates total sales amount for each product by checking all sales entries.
- Primary operation: Looping over each unique product and filtering sales data for that product.
- How many times: Once for each unique product, and inside that, scanning the entire sales data to sum amounts.
As the number of sales entries and products grow, the time to calculate totals grows faster.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 sales entries | About 10 x number of products checks |
| 100 sales entries | About 100 x number of products checks |
| 1000 sales entries | About 1000 x number of products checks |
Pattern observation: The total work grows roughly with the number of sales times the number of unique products.
Time Complexity: O(p * n)
This means the time grows with the number of products times the number of sales entries.
[X] Wrong: "The code runs in linear time because it just loops once."
[OK] Correct: The code loops over products, and for each product, it scans all sales entries, so the work multiplies.
Understanding how nested loops affect time helps you explain your approach clearly and shows you can think about efficiency in real data tasks.
"What if we used a grouping method like pandas groupby instead? How would the time complexity change?"