0
0
Data Analysis Pythondata~5 mins

Sales data analysis pattern in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Sales data analysis pattern
O(p * n)
Understanding Time Complexity

When analyzing sales data, we often run patterns to summarize or find trends. Understanding how long these patterns take helps us work efficiently.

We want to know how the time to analyze grows as the sales data gets bigger.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import pandas as pd

def total_sales_per_product(sales_df):
    result = {}
    for product in sales_df['product'].unique():
        total = sales_df[sales_df['product'] == product]['amount'].sum()
        result[product] = total
    return result

This code calculates total sales amount for each product by checking all sales entries.

Identify Repeating Operations
  • Primary operation: Looping over each unique product and filtering sales data for that product.
  • How many times: Once for each unique product, and inside that, scanning the entire sales data to sum amounts.
How Execution Grows With Input

As the number of sales entries and products grow, the time to calculate totals grows faster.

Input Size (n)Approx. Operations
10 sales entriesAbout 10 x number of products checks
100 sales entriesAbout 100 x number of products checks
1000 sales entriesAbout 1000 x number of products checks

Pattern observation: The total work grows roughly with the number of sales times the number of unique products.

Final Time Complexity

Time Complexity: O(p * n)

This means the time grows with the number of products times the number of sales entries.

Common Mistake

[X] Wrong: "The code runs in linear time because it just loops once."

[OK] Correct: The code loops over products, and for each product, it scans all sales entries, so the work multiplies.

Interview Connect

Understanding how nested loops affect time helps you explain your approach clearly and shows you can think about efficiency in real data tasks.

Self-Check

"What if we used a grouping method like pandas groupby instead? How would the time complexity change?"