0
0
Data Structures Theoryknowledge~6 mins

Suffix arrays in Data Structures Theory - Full Explanation

Choose your learning style9 modes available
Introduction
Imagine you want to quickly find if a word or pattern appears inside a long text. Searching every time from the start can be slow. Suffix arrays help solve this by organizing all possible endings of the text in a way that makes searching fast and easy.
Explanation
What is a suffix
A suffix is a part of a string that starts at any position and goes to the end. For example, in the word 'banana', 'nana' and 'a' are suffixes. Every string has as many suffixes as its length.
A suffix is a substring that begins at some position and continues to the end of the string.
Structure of a suffix array
A suffix array is a list of all suffixes of a string, sorted in alphabetical order. Instead of storing the suffixes themselves, it stores the starting positions of these suffixes in the original string. This makes it compact and efficient.
A suffix array stores the starting indexes of all suffixes sorted lexicographically.
How suffix arrays help searching
Because suffixes are sorted, you can use binary search to quickly find if a pattern exists in the text. This is much faster than checking every position one by one. It works like looking up a word in a dictionary.
Sorted suffixes allow fast pattern search using binary search.
Building suffix arrays
Creating a suffix array involves listing all suffixes and sorting them. Naively, this can be slow, but there are efficient algorithms that build suffix arrays quickly even for large texts. These algorithms use clever tricks to avoid repeated comparisons.
Efficient algorithms exist to build suffix arrays quickly for large strings.
Applications of suffix arrays
Suffix arrays are used in text search engines, DNA sequence analysis, and data compression. They help find repeated patterns, match sequences, and speed up many string-related tasks.
Suffix arrays are powerful tools for fast text and pattern matching in many fields.
Real World Analogy

Imagine a library where every book is opened at every possible page, and each page is listed in alphabetical order by its first sentence. To find a phrase, you just look it up in this sorted list instead of flipping through every book.

Suffix → A page in a book starting from a certain point to the end
Suffix array → A sorted catalog listing all pages by their first sentence
Binary search on suffix array → Looking up a phrase quickly in the catalog instead of reading every page
Building suffix arrays → Organizing and sorting all pages in the catalog efficiently
Applications → Using the catalog to find information quickly in libraries, DNA labs, or data centers
Diagram
Diagram
Original string: banana

Suffixes and their start positions:
0: banana
1: anana
2: nana
3: ana
4: na
5: a

Sorted suffixes (suffix array):
5: a
3: ana
1: anana
0: banana
4: na
2: nana
This diagram shows the original string, all its suffixes with their start positions, and the suffix array listing these positions sorted by suffix.
Key Facts
SuffixA substring starting at any position and extending to the end of the string.
Suffix arrayAn array of starting indexes of all suffixes sorted in lexicographical order.
Binary searchA fast search method that works on sorted data by repeatedly dividing the search interval in half.
Suffix array constructionThe process of creating a suffix array, which can be done efficiently with specialized algorithms.
Applications of suffix arraysUsed in text search, DNA analysis, and data compression for fast pattern matching.
Code Example
Data Structures Theory
def build_suffix_array(s: str) -> list[int]:
    return sorted(range(len(s)), key=lambda i: s[i:])

text = "banana"
suffix_array = build_suffix_array(text)
print(suffix_array)
OutputSuccess
Common Confusions
Suffix arrays store the suffix strings themselves.
Suffix arrays store the suffix strings themselves. Suffix arrays store only the starting positions of suffixes, not the suffix strings, to save space.
Suffix arrays are the same as suffix trees.
Suffix arrays are the same as suffix trees. Suffix arrays are simpler, space-efficient arrays of indexes, while suffix trees are complex tree structures; both help with string search but differ in design.
Building suffix arrays is always slow because it involves sorting many suffixes.
Building suffix arrays is always slow because it involves sorting many suffixes. Special algorithms build suffix arrays efficiently without sorting all suffixes directly, making it fast even for large texts.
Summary
Suffix arrays organize all endings of a string in sorted order to enable fast searching.
They store starting positions of suffixes, not the suffixes themselves, saving space.
Efficient algorithms and binary search make suffix arrays practical for large texts.