head() and tail() in Data Analysis Python - Time & Space Complexity
We want to understand how the time to run head() and tail() changes as the data size grows.
Specifically, how does the number of rows in a dataset affect these operations?
Analyze the time complexity of the following code snippet.
import pandas as pd
data = pd.DataFrame({'A': range(1000000)})
first_rows = data.head(5)
last_rows = data.tail(5)
This code gets the first 5 rows and the last 5 rows from a large dataset.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Selecting a fixed number of rows (5) from the start or end.
- How many times: Exactly 5 rows are accessed each time, no matter the dataset size.
Getting 5 rows from the start or end takes the same time whether the dataset has 10 or 1,000,000 rows.
| Input Size (n) | Approx. Operations |
|---|---|
| 10 | 5 |
| 100 | 5 |
| 1000 | 5 |
Pattern observation: The number of operations stays the same because we only look at a fixed number of rows.
Time Complexity: O(1)
This means the time to get the first or last few rows does not grow as the dataset gets bigger.
[X] Wrong: "Getting the first or last rows takes longer if the dataset is huge."
[OK] Correct: Because head() and tail() only access a small fixed number of rows, their time does not depend on the total size.
Understanding how simple data access methods scale helps you explain efficient data handling in real projects.
What if we changed head(5) to head(n) where n grows with the dataset size? How would the time complexity change?