0
0
Data Structures Theoryknowledge~10 mins

Suffix arrays in Data Structures Theory - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Suffix arrays
Input String S
Generate all suffixes of S
Sort suffixes lexicographically
Store starting indices of sorted suffixes
Suffix Array constructed
Use suffix array for fast substring queries
Start with the input string, list all suffixes, sort them alphabetically, then record their starting positions to form the suffix array.
Execution Sample
Data Structures Theory
S = "banana"
suffixes = [(S[i:], i) for i in range(len(S))]
suffixes.sort()
suffix_array = [pos for (suf, pos) in suffixes]
print(suffix_array)
This code creates all suffixes of "banana", sorts them, and prints the array of starting indices in sorted order.
Analysis Table
StepOperationSuffix GeneratedSuffix IndexSuffixes Sorted So FarSuffix Array State
1Generate suffixbanana0[('banana',0)][]
2Generate suffixanana1[('banana',0), ('anana',1)][]
3Generate suffixnana2[('banana',0), ('anana',1), ('nana',2)][]
4Generate suffixana3[('banana',0), ('anana',1), ('nana',2), ('ana',3)][]
5Generate suffixna4[('banana',0), ('anana',1), ('nana',2), ('ana',3), ('na',4)][]
6Generate suffixa5[('banana',0), ('anana',1), ('nana',2), ('ana',3), ('na',4), ('a',5)][]
7Sort suffixes lex order--[('a',5), ('ana',3), ('anana',1), ('banana',0), ('na',4), ('nana',2)][]
8Extract indices---[5, 3, 1, 0, 4, 2]
9Print suffix array---[5, 3, 1, 0, 4, 2]
💡 All suffixes generated and sorted; suffix array completed with starting indices.
State Tracker
VariableStartAfter Step 1After Step 2After Step 3After Step 4After Step 5After Step 6After Step 7After Step 8Final
S"banana""banana""banana""banana""banana""banana""banana""banana""banana""banana"
suffixes[][('banana',0)][('banana',0), ('anana',1)][('banana',0), ('anana',1), ('nana',2)][('banana',0), ('anana',1), ('nana',2), ('ana',3)][('banana',0), ('anana',1), ('nana',2), ('ana',3), ('na',4)][('banana',0), ('anana',1), ('nana',2), ('ana',3), ('na',4), ('a',5)][('a',5), ('ana',3), ('anana',1), ('banana',0), ('na',4), ('nana',2)][('a',5), ('ana',3), ('anana',1), ('banana',0), ('na',4), ('nana',2)][('a',5), ('ana',3), ('anana',1), ('banana',0), ('na',4), ('nana',2)]
suffix_array[][][][][][][][][5, 3, 1, 0, 4, 2][5, 3, 1, 0, 4, 2]
Key Insights - 3 Insights
Why do we store the starting indices of suffixes instead of the suffix strings themselves?
Storing indices saves space and allows quick reference to the original string. The execution_table row 8 shows extracting indices after sorting suffixes.
Why must suffixes be sorted lexicographically?
Sorting suffixes alphabetically arranges all substrings in order, enabling fast search. See execution_table row 7 where suffixes are sorted.
How does the suffix array help in substring search?
Because suffixes are sorted, binary search on suffix_array indices quickly finds substrings. This is implied after the suffix array is built in execution_table row 9.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 7. Which suffix is first in the sorted list?
A"banana" starting at index 0
B"a" starting at index 5
C"ana" starting at index 3
D"nana" starting at index 2
💡 Hint
Check the 'Suffixes Sorted So Far' column at step 7 in execution_table.
At which step does the suffix array get its final values?
AStep 8
BStep 7
CStep 6
DStep 9
💡 Hint
Look at the 'Suffix Array State' column in execution_table to see when indices are extracted.
If the input string was "apple", how many suffixes would be generated?
A4
B6
C5
D7
💡 Hint
Refer to variable_tracker 'suffixes' length after all suffixes generated for "banana" which has length 6.
Concept Snapshot
Suffix arrays list all suffixes of a string.
They are sorted alphabetically.
We store the starting indices of these sorted suffixes.
This structure helps fast substring search.
Building suffix arrays involves generating suffixes, sorting, and indexing.
Full Transcript
Suffix arrays are built by taking all suffixes of a string, sorting them alphabetically, and recording their starting positions. For example, for the string 'banana', we generate suffixes like 'banana', 'anana', 'nana', 'ana', 'na', and 'a'. These suffixes are sorted lexicographically to get 'a', 'ana', 'anana', 'banana', 'na', 'nana'. We then create the suffix array by storing the starting indices of these sorted suffixes, resulting in [5, 3, 1, 0, 4, 2]. This array allows efficient substring searches by binary searching the suffixes. The process involves generating suffixes, sorting them, and extracting indices, as shown step-by-step in the execution table and variable tracker. Key points include why indices are stored instead of strings, the importance of sorting, and how the suffix array supports fast queries.