Specifying column names and index in Pandas - Time & Space Complexity
When we specify column names and index in pandas, we want to know how the time to do this changes as the data grows.
We ask: How does the work grow when we add more rows or columns?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = [[1, 2], [3, 4], [5, 6]]
columns = ['A', 'B']
index = ['row1', 'row2', 'row3']
df = pd.DataFrame(data, columns=columns, index=index)
This code creates a DataFrame from a list of lists, assigning column names and row indexes explicitly.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: pandas reads each element in the data list to place it in the DataFrame.
- How many times: Once for each data element, so total elements equal rows times columns.
As the number of rows or columns grows, pandas processes more elements to assign values and labels.
| Input Size (n rows x m columns) | Approx. Operations |
|---|---|
| 10 x 2 | 20 |
| 100 x 5 | 500 |
| 1000 x 10 | 10,000 |
Pattern observation: The work grows roughly in proportion to the total number of elements (rows x columns).
Time Complexity: O(n x m)
This means the time to create the DataFrame grows in direct proportion to the number of rows times the number of columns.
[X] Wrong: "Specifying column names or index is instant and does not depend on data size."
[OK] Correct: Even though naming looks simple, pandas must assign these labels to each row or column, so the work grows with data size.
Understanding how data size affects operations like naming columns and indexes helps you reason about performance in real data tasks.
What if we only specify column names but let pandas assign default index? How would the time complexity change?