0
0
SciPydata~10 mins

CSC format (Compressed Sparse Column) in SciPy - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - CSC format (Compressed Sparse Column)
Start with dense matrix
Identify non-zero elements
Store values in 'data' array
Store row indices of these values
Store column pointers in 'indptr'
Create CSC matrix object
Use for efficient column slicing and operations
CSC format stores a sparse matrix by columns: values, row indices, and column pointers for fast column access.
Execution Sample
SciPy
import numpy as np
from scipy.sparse import csc_matrix

A = np.array([[0,0,1],[2,0,0],[0,3,0]])
csc = csc_matrix(A)
print(csc.data, csc.indices, csc.indptr)
Convert a small dense matrix to CSC format and print its internal arrays.
Execution Table
StepActionValues (data)Row Indices (indices)Column Pointers (indptr)
1Start with matrix A---
2Find non-zero in col 0[2][1][0]
3Find non-zero in col 1[2,3][1,2][0,1]
4Find non-zero in col 2[2,3,1][1,2,0][0,1,2]
5Add end pointer for last col[2,3,1][1,2,0][0,1,2,3]
6Create CSC matrix with these arrays[2,3,1][1,2,0][0,1,2,3]
💡 All columns processed, CSC arrays fully built.
Variable Tracker
VariableStartAfter Step 2After Step 3After Step 4After Step 5Final
data[][2][2,3][2,3,1][2,3,1][2,3,1]
indices[][1][1,2][1,2,0][1,2,0][1,2,0]
indptr[0][0][0,1][0,1,2][0,1,2,3][0,1,2,3]
Key Moments - 3 Insights
Why does 'indptr' have one more element than the number of columns?
'indptr' marks start of each column in 'data' and ends with total non-zero count, so it has length columns+1 (see execution_table step 5).
Why are row indices not sorted within each column?
Row indices correspond to order of non-zero elements found top to bottom in each column, preserving original matrix order (see execution_table steps 2-4).
What does 'data' array represent exactly?
'data' stores all non-zero values column by column, matching 'indices' for row positions (see execution_table steps 2-4).
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 4, what is the content of 'indices'?
A[1,2,0]
B[0,1,2]
C[2,3,1]
D[0,0,0]
💡 Hint
Check the 'Row Indices (indices)' column at step 4 in execution_table.
At which step does 'indptr' get its final length of 4?
AStep 3
BStep 5
CStep 4
DStep 6
💡 Hint
Look at 'Column Pointers (indptr)' column in execution_table to see when length changes to 4.
If the matrix had an extra zero column at the end, how would 'indptr' change?
AIt would have one less element
BIt would have same length but different values
CIt would have one more element with same last value repeated
DIt would be empty
💡 Hint
Recall 'indptr' length is columns+1 and last value equals total non-zero count.
Concept Snapshot
CSC format stores sparse matrices by columns.
It uses three arrays: data (non-zero values), indices (row numbers), and indptr (column start pointers).
'indptr' length is number of columns + 1.
Efficient for column slicing and matrix operations.
Used in scipy.sparse.csc_matrix.
Full Transcript
CSC format (Compressed Sparse Column) stores a sparse matrix by columns. It keeps three arrays: 'data' for non-zero values, 'indices' for their row positions, and 'indptr' which points to where each column starts in 'data'. The 'indptr' array has one more element than the number of columns, marking the start of each column and the end of the last. This format is efficient for column-based operations. The example code converts a small dense matrix to CSC and prints these arrays. Step by step, non-zero values and their row indices are collected column by column, and column pointers are built. This helps understand how CSC stores sparse data internally.