0
0
SciPydata~5 mins

CSR format (Compressed Sparse Row) in SciPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: CSR format (Compressed Sparse Row)
O(nnz)
Understanding Time Complexity

We want to understand how fast operations on CSR sparse matrices run as the data grows.

How does the time to access or process data change when the matrix size increases?

Scenario Under Consideration

Analyze the time complexity of the following code snippet.


import numpy as np
from scipy.sparse import csr_matrix

# Create a sparse matrix in CSR format
row = np.array([0, 0, 1, 2, 2, 2])
col = np.array([0, 2, 2, 0, 1, 2])
data = np.array([1, 2, 3, 4, 5, 6])

csr = csr_matrix((data, (row, col)), shape=(3, 3))

# Access all nonzero elements
for i in range(csr.shape[0]):
    start = csr.indptr[i]
    end = csr.indptr[i+1]
    for j in range(start, end):
        val = csr.data[j]
        # process val

This code creates a CSR sparse matrix and loops through all its nonzero elements row by row.

Identify Repeating Operations

Identify the loops, recursion, array traversals that repeat.

  • Primary operation: Looping over all nonzero elements in the matrix.
  • How many times: Exactly once per nonzero element, grouped by rows.
How Execution Grows With Input

As the number of nonzero elements grows, the time to process them grows proportionally.

Input Size (nonzero elements)Approx. Operations
1010
100100
10001000

Pattern observation: The time grows linearly with the number of nonzero elements.

Final Time Complexity

Time Complexity: O(nnz)

This means the time grows directly with the number of nonzero entries in the matrix.

Common Mistake

[X] Wrong: "Accessing elements in CSR format takes time proportional to the total matrix size (rows x columns)."

[OK] Correct: CSR stores only nonzero elements, so operations depend on how many nonzero values exist, not the full matrix size.

Interview Connect

Understanding CSR time complexity helps you explain how sparse data is handled efficiently in real projects.

Self-Check

"What if we changed the code to access elements by column instead of by row? How would the time complexity change?"