Survey data analysis pattern in Data Analysis Python - Time & Space Complexity
When analyzing survey data, we often process many responses to find insights.
We want to know how the time to analyze grows as the number of survey responses increases.
Analyze the time complexity of the following code snippet.
import pandas as pd
def analyze_survey(data):
results = {}
for question in data.columns:
counts = data[question].value_counts()
results[question] = counts
return results
This code counts how many times each answer appears for every question in the survey data.
- Primary operation: Looping over each question (column) and counting answers.
- How many times: Once for each question in the survey data.
As the number of survey responses grows, counting answers for each question takes more time.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 responses | Counts done on 10 answers per question |
| 100 responses | Counts done on 100 answers per question |
| 1000 responses | Counts done on 1000 answers per question |
Pattern observation: The time grows roughly in direct proportion to the number of responses.
Time Complexity: O(m * n)
This means the time grows with both the number of questions (m) and the number of responses (n).
[X] Wrong: "Counting answers for all questions takes the same time no matter how many responses there are."
[OK] Correct: More responses mean more data to count, so the time increases with the number of responses.
Understanding how data size affects analysis time helps you explain your approach clearly and shows you think about efficiency.
"What if we only analyzed a fixed number of questions regardless of total questions? How would the time complexity change?"