Extracting day of week and hour in Pandas - Time & Space Complexity
We want to see how the time to extract day of week and hour from timestamps grows as data size increases.
How does the work change when we have more rows of date-time data?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'timestamp': pd.date_range('2023-01-01', periods=1000, freq='H')
})
df['day_of_week'] = df['timestamp'].dt.dayofweek
df['hour'] = df['timestamp'].dt.hour
This code creates a DataFrame with hourly timestamps and extracts the day of week and hour into new columns.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Extracting day of week and hour from each timestamp in the column.
- How many times: Once for each row in the DataFrame (n times).
Each timestamp is processed individually to get day and hour, so work grows directly with number of rows.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 extractions |
| 100 | 100 extractions |
| 1000 | 1000 extractions |
Pattern observation: The work increases evenly as the number of timestamps increases.
Time Complexity: O(n)
This means the time to extract day and hour grows in direct proportion to the number of timestamps.
[X] Wrong: "Extracting day and hour is constant time no matter how many rows there are."
[OK] Correct: Each row needs its own extraction, so more rows mean more work.
Understanding how data size affects time helps you explain your code choices clearly and confidently.
"What if we extracted day of week and hour from multiple timestamp columns instead of one? How would the time complexity change?"