0
0
Pandasdata~5 mins

str.replace() for substitution in Pandas - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: str.replace() for substitution
O(n * m)
Understanding Time Complexity

We want to understand how the time it takes to replace text in a pandas column changes as the data grows.

How does the work grow when we replace strings in many rows?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

df = pd.DataFrame({
    'text': ['apple pie', 'banana split', 'apple tart', 'banana bread'] * 1000
})

# Replace 'apple' with 'orange' in the 'text' column
result = df['text'].str.replace('apple', 'orange', regex=False)

This code replaces the word 'apple' with 'orange' in every string of the 'text' column.

Identify Repeating Operations
  • Primary operation: Checking and replacing the substring in each string of the column.
  • How many times: Once for each row in the DataFrame (n times).
How Execution Grows With Input

As the number of rows grows, the total work grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 10 string checks and replacements
100About 100 string checks and replacements
1000About 1000 string checks and replacements

Pattern observation: Doubling the rows roughly doubles the work.

Final Time Complexity

Time Complexity: O(n * m)

This means the time grows with the number of rows (n) and the length of each string (m) because each string is checked and replaced.

Common Mistake

[X] Wrong: "Replacing text in a column is always very fast and does not depend on data size."

[OK] Correct: The operation must look at each string, so more rows or longer strings mean more work and more time.

Interview Connect

Understanding how string operations scale helps you write efficient data cleaning code and explain your choices clearly in interviews.

Self-Check

"What if we replaced a regex pattern instead of a fixed string? How would the time complexity change?"