Exploratory Data Analysis (EDA) template in Data Analysis Python - Time & Space Complexity
We want to understand how the time needed to explore data grows as the data size increases.
How does the work change when we have more rows or columns to analyze?
Analyze the time complexity of the following EDA template code.
import pandas as pd
def eda_template(df):
print(df.head())
print(df.describe())
print(df.info())
for col in df.columns:
print(f"Unique values in {col}:", df[col].nunique())
This code prints basic summaries and counts unique values for each column in the data.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Loop over all columns to count unique values.
- How many times: Once per column in the dataset.
The time grows mainly with the number of columns because we count unique values for each.
| Input Size (columns) | Approx. Operations |
|---|---|
| 10 | 10 unique counts |
| 100 | 100 unique counts |
| 1000 | 1000 unique counts |
Pattern observation: The work increases linearly with the number of columns.
Time Complexity: O(c * n)
This means the time grows with both the number of columns (c) and the number of rows (n) because counting unique values scans each column's data.
[X] Wrong: "The time only depends on the number of columns, not rows."
[OK] Correct: Counting unique values requires looking at every row in each column, so rows affect time too.
Understanding how data size affects EDA steps helps you explain your approach clearly and shows you think about efficiency in real projects.
"What if we added a nested loop to compare every pair of columns? How would the time complexity change?"