info() for column types in Data Analysis Python - Time & Space Complexity
We want to understand how the time to run info() on a data table changes as the table grows.
Specifically, how does checking column types scale with more rows and columns?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # example value for n
df = pd.DataFrame({
'A': range(n),
'B': [str(i) for i in range(n)],
'C': [float(i) for i in range(n)]
})
df.info()
This code creates a table with n rows and 3 columns, then calls info() to show column types and counts.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Scanning each column to check data types and count non-null values.
- How many times: For each of the 3 columns, it looks through all
nrows once.
As the number of rows n grows, the time to check each column grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 checks (3 columns x 10 rows) |
| 100 | About 300 checks |
| 1000 | About 3000 checks |
Pattern observation: The work grows linearly with the number of rows.
Time Complexity: O(n)
This means the time to run info() grows in a straight line as the number of rows increases.
[X] Wrong: "The time to run info() depends mostly on the number of columns, not rows."
[OK] Correct: While columns matter, info() checks every row in each column, so more rows mean more work.
Understanding how data size affects analysis speed helps you write efficient code and explain your choices clearly.
"What if the DataFrame had 100 columns instead of 3? How would the time complexity change?"