Numeric types (int64, float64) in Pandas - Time & Space Complexity
We want to understand how fast operations on numeric data types like int64 and float64 run in pandas.
How does the time to process numbers grow when we have more data?
Analyze the time complexity of the following code snippet.
import pandas as pd
# Create a DataFrame with numeric columns
n = 1000
df = pd.DataFrame({
'A': pd.Series(range(n), dtype='int64'),
'B': pd.Series([float(x) for x in range(n)], dtype='float64')
})
# Calculate the sum of each column
result = df.sum()
This code creates a DataFrame with two numeric columns and sums each column.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Summing values in each column by visiting every number once.
- How many times: Each number in the column is visited exactly once during the sum.
When the number of rows grows, the time to sum grows roughly the same way.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | About 10 additions per column |
| 100 | About 100 additions per column |
| 1000 | About 1000 additions per column |
Pattern observation: The work grows directly with the number of rows.
Time Complexity: O(n)
This means the time to sum numeric columns grows linearly with the number of rows.
[X] Wrong: "Summing numeric columns is instant no matter how big the data is."
[OK] Correct: Even though pandas is fast, it still needs to visit each number once, so more data means more time.
Knowing how numeric operations scale helps you explain performance and choose the right methods when working with data.
"What if we summed only a subset of rows using a filter? How would the time complexity change?"