0
0
Data Analysis Pythondata~5 mins

Memory usage analysis in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: Time complexity analysis
O(n * m)
Understanding Time Complexity

When we analyze time complexity, we want to understand how the runtime grows as the data size grows.

We ask: How does the work increase as the input size grows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import pandas as pd

def calculate_mean(df):
    means = {}
    for col in df.columns:
        means[col] = df[col].mean()
    return means

# df is a DataFrame with n rows and m columns

This code calculates the average value for each column in a data table.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Looping over each column and computing the mean of all rows in that column.
  • How many times: The loop runs once for each column (m times), and inside each, the mean function processes all rows (n times).
How Execution Grows With Input

As the number of rows (n) or columns (m) grows, the work grows too.

Input Size (n rows, m columns)Approx. Operations
10 rows, 5 columnsAbout 50 operations (10*5)
100 rows, 5 columnsAbout 500 operations (100*5)
1000 rows, 10 columnsAbout 10,000 operations (1000*10)

Pattern observation: The total work grows by multiplying rows and columns, so doubling either doubles the work.

Final Time Complexity

Time Complexity: O(n * m)

This means the time to calculate all means grows proportionally with the number of rows times the number of columns.

Common Mistake

[X] Wrong: "Calculating the mean for each column is just O(m) because we loop over columns only."

[OK] Correct: Each mean calculation looks at all rows, so the work inside the loop depends on n, making total work depend on both n and m.

Interview Connect

Understanding how data size affects processing time is key in data science. This skill helps you explain and improve data handling in real projects.

Self-Check

"What if we used a built-in function that calculates means for all columns at once? How would the time complexity change?"