Web analytics data pattern in Data Analysis Python - Time & Space Complexity
When analyzing web analytics data, we often process many records to find patterns.
We want to know how the time to analyze grows as the data size grows.
Analyze the time complexity of the following code snippet.
import pandas as pd
def count_page_views(df):
counts = {}
for page in df['page_url']:
counts[page] = counts.get(page, 0) + 1
return counts
This code counts how many times each page URL appears in the data.
- Primary operation: Looping through each page URL in the data.
- How many times: Once for every record in the dataset.
As the number of page views grows, the time to count them grows roughly the same.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 loops |
| 100 | About 100 loops |
| 1000 | About 1000 loops |
Pattern observation: The time grows directly with the number of records.
Time Complexity: O(n)
This means the time to count page views grows in a straight line as data grows.
[X] Wrong: "Counting page views takes the same time no matter how many records there are."
[OK] Correct: The code must look at each record once, so more records mean more time.
Understanding how data size affects processing time helps you explain your approach clearly in interviews.
"What if we used nested loops to compare each page URL with every other? How would the time complexity change?"