info() for column types and nulls in Pandas - Time & Space Complexity
We want to understand how long it takes for pandas to show column types and count missing values using info().
How does the time grow when the data has more rows or columns?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 1000
df = pd.DataFrame({
'A': range(n),
'B': [None if i % 10 == 0 else i for i in range(n)],
'C': ['text'] * n
})
df.info()
This code creates a DataFrame with n rows and 3 columns, then calls info() to show data types and count nulls.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: pandas scans each column to count non-null values and check data types.
- How many times: It processes each of the
nrows once per column.
As the number of rows grows, pandas must check more data to count nulls and types.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 30 checks (3 columns x 10 rows) |
| 100 | About 300 checks |
| 1000 | About 3000 checks |
Pattern observation: The work grows roughly in direct proportion to the number of rows times columns.
Time Complexity: O(n x m)
This means the time grows linearly with the number of rows n and columns m.
[X] Wrong: "Calling info() is always very fast and does not depend on data size."
[OK] Correct: Actually, info() looks at every row in each column to count nulls and types, so bigger data means more work and longer time.
Knowing how info() scales helps you understand data inspection costs and prepares you to explain performance in real data projects.
What if we added many more columns instead of rows? How would the time complexity change?