0
0
Data Analysis Pythondata~5 mins

Creating DataFrames (dict, list, CSV) in Data Analysis Python - Performance & Efficiency

Choose your learning style9 modes available
Time Complexity: Creating DataFrames (dict, list, CSV)
O(n)
Understanding Time Complexity

When we create DataFrames from different data sources, the time it takes depends on the input size and format.

We want to know how the time grows as the data gets bigger.

Scenario Under Consideration

Analyze the time complexity of the following code snippet.

import pandas as pd

n = 10  # example value for n
data_dict = {"A": list(range(n)), "B": list(range(n))}
df_from_dict = pd.DataFrame(data_dict)

list_of_lists = [[i, i*2] for i in range(n)]
df_from_list = pd.DataFrame(list_of_lists, columns=["A", "B"])

# Assume CSV file with n rows
# df_from_csv = pd.read_csv('data.csv')  # reading CSV into DataFrame

This code creates DataFrames from a dictionary, a list of lists, and mentions reading from a CSV file.

Identify Repeating Operations
  • Primary operation: Iterating over n rows to build DataFrame structure.
  • How many times: Each element in the input data is processed once.
How Execution Grows With Input

As the number of rows n increases, the time to create the DataFrame grows roughly in direct proportion.

Input Size (n)Approx. Operations
10About 10 operations
100About 100 operations
1000About 1000 operations

Pattern observation: Doubling the input roughly doubles the work done.

Final Time Complexity

Time Complexity: O(n)

This means the time to create a DataFrame grows linearly with the number of rows.

Common Mistake

[X] Wrong: "Creating a DataFrame from a dictionary or list is instant and does not depend on data size."

[OK] Correct: Even though it feels fast for small data, the process reads each row once, so time grows with data size.

Interview Connect

Understanding how data loading time grows helps you explain performance in real projects and shows you think about efficiency.

Self-Check

"What if we read a CSV file with many columns instead of rows? How would the time complexity change?"