np.genfromtxt() for handling missing data in NumPy - Time & Space Complexity
When loading data with missing values using np.genfromtxt(), it is important to know how the time to read the file grows as the file size increases.
We want to understand how the processing time changes when the input data gets bigger.
Analyze the time complexity of the following code snippet.
import numpy as np
data = np.genfromtxt('data.csv', delimiter=',', filling_values=-1)
print(data)
This code reads a CSV file with missing values and fills them with -1 while loading the data into a numpy array.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Reading each line and each value in the file to parse and fill missing data.
- How many times: Once for every value in the input file (rows x columns).
As the number of rows and columns increases, the time to read and process each value grows proportionally.
| Input Size (rows x columns) | Approx. Operations |
|---|---|
| 10 x 5 = 50 | About 50 operations |
| 100 x 5 = 500 | About 500 operations |
| 1000 x 5 = 5000 | About 5000 operations |
Pattern observation: The operations grow roughly in direct proportion to the total number of values in the file.
Time Complexity: O(n)
This means the time to load and fill missing data grows linearly with the number of data points.
[X] Wrong: "Handling missing data with np.genfromtxt() takes constant time regardless of file size."
[OK] Correct: The function must check every value to find and fill missing data, so time grows with the total number of values.
Understanding how data loading time grows helps you explain performance in real projects where files can be large and messy.
"What if we changed filling_values to None and handled missing data later? How would the time complexity change?"