Date feature extraction in Data Analysis Python - Time & Space Complexity
We want to understand how the time to extract parts of dates grows as we handle more data.
How does the work change when we have more dates to process?
Analyze the time complexity of the following code snippet.
import pandas as pd
dates = pd.Series(pd.date_range('2023-01-01', periods=1000))
# Extract year, month, day from each date
features = pd.DataFrame({
'year': dates.dt.year,
'month': dates.dt.month,
'day': dates.dt.day
})
This code creates a list of dates and extracts year, month, and day parts into a new table.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Extracting year, month, and day from each date in the list.
- How many times: Once for each date in the input series.
Each date is processed individually to get its parts, so the work grows directly with the number of dates.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 (3 parts x 10 dates) |
| 100 | About 300 (3 parts x 100 dates) |
| 1000 | About 3000 (3 parts x 1000 dates) |
Pattern observation: The total work increases steadily as we add more dates, roughly multiplying by the number of dates.
Time Complexity: O(n)
This means the time to extract date parts grows in a straight line with the number of dates.
[X] Wrong: "Extracting multiple parts from dates takes the same time no matter how many dates there are."
[OK] Correct: Each date must be processed separately, so more dates mean more work and more time.
Understanding how data size affects processing time helps you explain your code choices clearly and shows you think about efficiency.
"What if we extracted more features like hour, minute, and second? How would the time complexity change?"