CSR format (Compressed Sparse Row) in SciPy - Time & Space Complexity
We want to understand how fast operations on CSR sparse matrices run as the data grows.
How does the time to access or process data change when the matrix size increases?
Analyze the time complexity of the following code snippet.
import numpy as np
from scipy.sparse import csr_matrix
# Create a sparse matrix in CSR format
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])
csr = csr_matrix((data, (row, col)), shape=(3, 3))
# Access all nonzero elements
for i in range(csr.shape[0]):
start = csr.indptr[i]
end = csr.indptr[i+1]
for j in range(start, end):
val = csr.data[j]
# process val
This code creates a CSR sparse matrix and loops through all its nonzero elements row by row.
Identify the loops, recursion, array traversals that repeat.
- Primary operation: Looping over all nonzero elements in the matrix.
- How many times: Exactly once per nonzero element, grouped by rows.
As the number of nonzero elements grows, the time to process them grows proportionally.
| Input Size (nonzero elements) | Approx. Operations |
|---|---|
| 10 | 10 |
| 100 | 100 |
| 1000 | 1000 |
Pattern observation: The time grows linearly with the number of nonzero elements.
Time Complexity: O(nnz)
This means the time grows directly with the number of nonzero entries in the matrix.
[X] Wrong: "Accessing elements in CSR format takes time proportional to the total matrix size (rows x columns)."
[OK] Correct: CSR stores only nonzero elements, so operations depend on how many nonzero values exist, not the full matrix size.
Understanding CSR time complexity helps you explain how sparse data is handled efficiently in real projects.
"What if we changed the code to access elements by column instead of by row? How would the time complexity change?"