Creating DataFrame from list of dictionaries in Pandas - Performance & Efficiency
We want to understand how the time needed to create a DataFrame changes as the input list grows.
How does the work increase when we add more dictionaries to the list?
Analyze the time complexity of the following code snippet.
import pandas as pd
list_of_dicts = [
{'name': 'Alice', 'age': 25},
{'name': 'Bob', 'age': 30},
{'name': 'Charlie', 'age': 35}
]
df = pd.DataFrame(list_of_dicts)
This code creates a DataFrame from a list where each item is a dictionary representing a row.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: pandas reads each dictionary in the list to build rows.
- How many times: Once for each dictionary in the list (n times).
As the list grows, pandas processes each dictionary one by one to form the DataFrame.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 dictionary reads |
| 100 | About 100 dictionary reads |
| 1000 | About 1000 dictionary reads |
Pattern observation: The work grows directly with the number of dictionaries; doubling the list roughly doubles the work.
Time Complexity: O(n)
This means the time to create the DataFrame grows linearly with the number of dictionaries in the list.
[X] Wrong: "Creating a DataFrame from a list of dictionaries is instant no matter the size."
[OK] Correct: Each dictionary must be read and processed, so more dictionaries mean more work and more time.
Understanding how data size affects DataFrame creation helps you reason about performance in real projects and interviews.
"What if we changed the input from a list of dictionaries to a dictionary of lists? How would the time complexity change?"