Data Analysis Pythondata~5 mins

Inner join in Data Analysis Python - Time & Space Complexity

Choose your learning style9 modes available

Learn Why Deep Visual Try Challenge Project Recall Time

Time Complexity: Inner join

O(n * m)

Understanding Time Complexity

When we combine two tables using an inner join, we want to know how the time to do this grows as the tables get bigger.

We ask: How does the work increase when the number of rows in each table grows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df1 = pd.DataFrame({'key': [1, 2, 3], 'value1': ['A', 'B', 'C']})
df2 = pd.DataFrame({'key': [2, 3, 4], 'value2': ['X', 'Y', 'Z']})

result = pd.merge(df1, df2, on='key', how='inner')
print(result)

This code joins two tables on a common column called 'key' and keeps only matching rows.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

Primary operation: Checking each row in the first table against rows in the second table to find matching keys.
How many times: For each row in the first table, the algorithm looks for matching rows in the second table.

How Execution Grows With Input

As the number of rows in both tables grows, the number of comparisons grows too.

Input Size (n)	Approx. Operations
10	About 100 checks
100	About 10,000 checks
1000	About 1,000,000 checks

Pattern observation: The work grows much faster than the input size because each row in one table is compared to many rows in the other.

Final Time Complexity

Time Complexity: O(n * m)

This means the time to join grows roughly by multiplying the number of rows in the first table by the number of rows in the second table.

Common Mistake

[X] Wrong: "The inner join time grows only with the size of one table."

[OK] Correct: Because the join must compare rows from both tables, the time depends on both sizes, not just one.

Interview Connect

Understanding how joins scale helps you explain database performance clearly and shows you know what happens behind the scenes.

Self-Check

"What if one table has an index on the join column? How would the time complexity change?"