Creating DataFrames (dict, list, CSV) in Data Analysis Python - Performance & Efficiency
When we create DataFrames from different data sources, the time it takes depends on the input size and format.
We want to know how the time grows as the data gets bigger.
Analyze the time complexity of the following code snippet.
import pandas as pd
n = 10 # example value for n
data_dict = {"A": list(range(n)), "B": list(range(n))}
df_from_dict = pd.DataFrame(data_dict)
list_of_lists = [[i, i*2] for i in range(n)]
df_from_list = pd.DataFrame(list_of_lists, columns=["A", "B"])
# Assume CSV file with n rows
# df_from_csv = pd.read_csv('data.csv') # reading CSV into DataFrame
This code creates DataFrames from a dictionary, a list of lists, and mentions reading from a CSV file.
- Primary operation: Iterating over n rows to build DataFrame structure.
- How many times: Each element in the input data is processed once.
As the number of rows n increases, the time to create the DataFrame grows roughly in direct proportion.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 operations |
| 100 | About 100 operations |
| 1000 | About 1000 operations |
Pattern observation: Doubling the input roughly doubles the work done.
Time Complexity: O(n)
This means the time to create a DataFrame grows linearly with the number of rows.
[X] Wrong: "Creating a DataFrame from a dictionary or list is instant and does not depend on data size."
[OK] Correct: Even though it feels fast for small data, the process reads each row once, so time grows with data size.
Understanding how data loading time grows helps you explain performance in real projects and shows you think about efficiency.
"What if we read a CSV file with many columns instead of rows? How would the time complexity change?"