CSC format (Compressed Sparse Column) in SciPy - Time & Space Complexity
We want to understand how fast operations run when using the CSC format in scipy.
Specifically, how does the time to access or process data grow as the matrix size grows?
Analyze the time complexity of accessing all nonzero elements column by column in a CSC matrix.
from scipy.sparse import csc_matrix
# Create a sparse matrix in CSC format
matrix = csc_matrix([[0, 0, 3], [4, 0, 0], [0, 5, 0]])
# Access nonzero elements column-wise
for col in range(matrix.shape[1]):
start = matrix.indptr[col]
end = matrix.indptr[col + 1]
for data_index in range(start, end):
row = matrix.indices[data_index]
value = matrix.data[data_index]
# process value
This code loops through each column and accesses all nonzero values in that column.
Look at what repeats in the code:
- Primary operation: Looping over each column, then looping over nonzero elements in that column.
- How many times: Outer loop runs once per column (n times), inner loop runs once per nonzero element in that column.
As the matrix grows, the number of columns and nonzero elements increase.
| Input Size (n columns) | Approx. Operations (nonzero elements) |
|---|---|
| 10 | Depends on nonzero count, e.g., 20 |
| 100 | About 200 nonzero elements |
| 1000 | About 2000 nonzero elements |
Pattern observation: The total operations grow roughly with the number of nonzero elements, not just columns.
Time Complexity: O(n + k)
This means the time grows with the number of columns plus the number of nonzero elements.
[X] Wrong: "Accessing elements in CSC format takes time proportional to total matrix size (rows x columns)."
[OK] Correct: CSC stores only nonzero elements, so time depends on columns and nonzero count, not all elements.
Understanding how sparse matrix formats affect operation speed helps you explain efficient data handling in real projects.
"What if we changed from CSC to CSR format? How would the time complexity for accessing elements change?"