0
0
SciPydata~5 mins

CSC format (Compressed Sparse Column) in SciPy - Time & Space Complexity

Choose your learning style9 modes available
Time Complexity: CSC format (Compressed Sparse Column)
O(n + k)
Understanding Time Complexity

We want to understand how fast operations run when using the CSC format in scipy.

Specifically, how does the time to access or process data grow as the matrix size grows?

Scenario Under Consideration

Analyze the time complexity of accessing all nonzero elements column by column in a CSC matrix.


from scipy.sparse import csc_matrix

# Create a sparse matrix in CSC format
matrix = csc_matrix([[0, 0, 3], [4, 0, 0], [0, 5, 0]])

# Access nonzero elements column-wise
for col in range(matrix.shape[1]):
    start = matrix.indptr[col]
    end = matrix.indptr[col + 1]
    for data_index in range(start, end):
        row = matrix.indices[data_index]
        value = matrix.data[data_index]
        # process value

This code loops through each column and accesses all nonzero values in that column.

Identify Repeating Operations

Look at what repeats in the code:

  • Primary operation: Looping over each column, then looping over nonzero elements in that column.
  • How many times: Outer loop runs once per column (n times), inner loop runs once per nonzero element in that column.
How Execution Grows With Input

As the matrix grows, the number of columns and nonzero elements increase.

Input Size (n columns)Approx. Operations (nonzero elements)
10Depends on nonzero count, e.g., 20
100About 200 nonzero elements
1000About 2000 nonzero elements

Pattern observation: The total operations grow roughly with the number of nonzero elements, not just columns.

Final Time Complexity

Time Complexity: O(n + k)

This means the time grows with the number of columns plus the number of nonzero elements.

Common Mistake

[X] Wrong: "Accessing elements in CSC format takes time proportional to total matrix size (rows x columns)."

[OK] Correct: CSC stores only nonzero elements, so time depends on columns and nonzero count, not all elements.

Interview Connect

Understanding how sparse matrix formats affect operation speed helps you explain efficient data handling in real projects.

Self-Check

"What if we changed from CSC to CSR format? How would the time complexity for accessing elements change?"