Creating DataFrame from dictionary in Pandas - Performance & Efficiency
When we create a DataFrame from a dictionary, we want to know how long it takes as the data grows.
We ask: How does the time to build the DataFrame change when the dictionary gets bigger?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = {"A": [1, 2, 3], "B": [4, 5, 6]}
df = pd.DataFrame(data)
This code creates a DataFrame from a dictionary where keys are column names and values are lists of data.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: pandas reads each list in the dictionary to build columns.
- How many times: It processes each element in every list once.
As the number of rows grows, pandas must read more elements to build the DataFrame.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | Reads about 10 elements per column |
| 100 | Reads about 100 elements per column |
| 1000 | Reads about 1000 elements per column |
Pattern observation: The work grows roughly in direct proportion to the number of rows.
Time Complexity: O(n)
This means the time to create the DataFrame grows linearly with the number of rows in the dictionary lists.
[X] Wrong: "Creating a DataFrame from a dictionary is instant no matter how big the data is."
[OK] Correct: pandas must read every element to build the DataFrame, so bigger data takes more time.
Understanding how data size affects DataFrame creation helps you write efficient data loading code and explain your choices clearly.
"What if the dictionary values were numpy arrays instead of lists? How would the time complexity change?"