str.strip() for whitespace in Pandas - Time & Space Complexity
We want to understand how long it takes to remove spaces from text data in pandas.
Specifically, how the time grows when we use str.strip() on many text entries.
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'text': [' hello ', ' world ', ' pandas ', ' data ', ' science '] * 1000
})
df['clean_text'] = df['text'].str.strip()
This code creates a DataFrame with repeated text entries and removes spaces from both ends of each string.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Applying
str.strip()to each string in the column. - How many times: Once for each row in the DataFrame (n times).
As the number of rows grows, the total work grows roughly the same amount.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 strip operations |
| 100 | About 100 strip operations |
| 1000 | About 1000 strip operations |
Pattern observation: Doubling the rows doubles the work because each string is processed once.
Time Complexity: O(n)
This means the time to strip whitespace grows linearly with the number of strings.
[X] Wrong: "Stripping whitespace is instant and does not depend on data size."
[OK] Correct: Each string must be checked and trimmed, so more strings mean more work.
Knowing how string operations scale helps you handle real data cleaning tasks efficiently.
"What if we used str.strip() on a column with very long strings? How would the time complexity change?"