Basic DataFrame info (shape, dtypes, describe) in Data Analysis Python - Time & Space Complexity
We want to know how long it takes to get basic information from a DataFrame.
Specifically, how does the time grow when we ask for shape, data types, or summary statistics?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 1000
df = pd.DataFrame({
'A': range(n),
'B': range(n, 2*n),
'C': ['text']*n
})
shape = df.shape
types = df.dtypes
summary = df.describe()
This code creates a DataFrame and gets its shape, data types, and summary statistics.
- Primary operation: Calculating summary statistics with
df.describe()loops over columns and rows. - How many times: It processes each column and each row of numeric data once.
Getting shape and data types is very fast and does not depend on data size much.
Calculating summary statistics grows with the number of rows and columns.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 20 operations (10 rows x 2 numeric columns) |
| 100 | About 200 operations |
| 1000 | About 2000 operations |
Pattern observation: The work grows roughly in proportion to the number of rows times numeric columns.
Time Complexity: O(n x m)
This means the time grows linearly with the number of rows and columns in the DataFrame.
[X] Wrong: "Getting the shape or data types takes a long time like processing all data."
[OK] Correct: Shape and data types are stored metadata, so accessing them is very fast and does not depend on data size.
Understanding how basic DataFrame info scales helps you work efficiently with data and explain your code clearly.
"What if we used df.describe(include='all') to include all columns? How would the time complexity change?"