dtypes for column data types in Pandas - Time & Space Complexity
We want to understand how long it takes to check the data types of columns in a pandas DataFrame.
Specifically, how does the time grow when the number of columns changes?
Analyze the time complexity of the following code snippet.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4.0, 5.5, 6.1],
'C': ['x', 'y', 'z']
})
column_types = df.dtypes
print(column_types)
This code creates a DataFrame and then gets the data types of each column.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Checking the data type of each column in the DataFrame.
- How many times: Once for each column in the DataFrame.
As the number of columns increases, the time to check all data types grows in a straight line.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The time grows directly with the number of columns.
Time Complexity: O(n)
This means the time to get column data types grows linearly with the number of columns.
[X] Wrong: "Checking data types depends on the number of rows in the DataFrame."
[OK] Correct: The data type is stored per column, so checking it does not depend on how many rows there are.
Knowing how operations scale with data size helps you write efficient code and explain your choices clearly.
"What if we checked data types for every cell instead of just columns? How would the time complexity change?"