Checking data types in Data Analysis Python - Time & Space Complexity
We want to understand how long it takes to check data types in a dataset as it grows.
How does the time needed change when we have more data?
Analyze the time complexity of the following code snippet.
import pandas as pd
def check_types(df):
types = []
for col in df.columns:
types.append(df[col].dtype)
return types
This code checks the data type of each column in a DataFrame and collects them in a list.
- Primary operation: Looping over each column in the DataFrame.
- How many times: Once for each column, so as many times as there are columns.
As the number of columns grows, the time to check all types grows in the same way.
| Input Size (n columns) | Approx. Operations |
|---|---|
| 10 | 10 checks |
| 100 | 100 checks |
| 1000 | 1000 checks |
Pattern observation: The time grows directly with the number of columns.
Time Complexity: O(n)
This means the time to check data types grows in a straight line with the number of columns.
[X] Wrong: "Checking data types depends on the number of rows in the data."
[OK] Correct: The code only looks at column types, which are stored as metadata, so rows do not affect the time.
Understanding how operations scale with data size helps you explain your code clearly and shows you think about efficiency.
"What if we checked the data type of every single cell instead of just columns? How would the time complexity change?"