String cleaning (strip, lower, replace) in Data Analysis Python - Time & Space Complexity
We want to understand how the time to clean strings grows as we handle more data.
How does the work change when we clean many strings or longer strings?
Analyze the time complexity of the following code snippet.
def clean_strings(strings):
cleaned = []
for s in strings:
s = s.strip()
s = s.lower()
s = s.replace(' ', '_')
cleaned.append(s)
return cleaned
This code cleans a list of strings by removing spaces at ends, making all letters lowercase, and replacing spaces inside with underscores.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping over each string in the list.
- How many times: Once for each string in the input list.
- Inside the loop, string methods operate on each string's characters.
- The dominant work depends on the total number of characters processed.
As we add more strings or longer strings, the work grows roughly with the total characters.
| Input Size (n strings) | Approx. Operations (characters processed) |
|---|---|
| 10 strings, avg 5 chars | ~50 |
| 100 strings, avg 5 chars | ~500 |
| 1000 strings, avg 5 chars | ~5000 |
Pattern observation: The work grows linearly with the total number of characters across all strings.
Time Complexity: O(n * m)
This means the time grows with the number of strings (n) times the average length of each string (m).
[X] Wrong: "String cleaning takes the same time no matter how long the strings are."
[OK] Correct: Each string method looks at every character, so longer strings take more time to process.
Understanding how string operations scale helps you explain your code's efficiency clearly and confidently.
"What if we used a regular expression to replace spaces instead of the replace method? How would the time complexity change?"