shape for dimensions in Pandas - Time & Space Complexity
We want to understand how checking the size of data changes as the data grows.
Specifically, how long does it take to get the shape of a pandas DataFrame?
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10
data = pd.DataFrame({
'A': range(n),
'B': range(n)
})
rows, cols = data.shape
print(f"Rows: {rows}, Columns: {cols}")
This code creates a DataFrame with n rows and 2 columns, then gets its shape.
- Primary operation: Accessing the stored shape attribute of the DataFrame.
- How many times: Exactly once, no loops or repeated steps.
Getting the shape does not depend on the number of rows or columns.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 1 |
| 100 | 1 |
| 1000 | 1 |
Pattern observation: The operation count stays the same no matter how big the data is.
Time Complexity: O(1)
This means getting the shape is very fast and does not take longer as the data grows.
[X] Wrong: "Getting the shape takes longer if the DataFrame has more rows or columns."
[OK] Correct: The shape is stored as a simple pair of numbers, so accessing it is instant regardless of size.
Knowing that some operations are instant helps you focus on the parts of code that really slow down with big data.
"What if we tried to count the number of unique values in a column instead? How would the time complexity change?"