What is B+ Tree Index: Explanation, Example, and Use Cases
B+ tree index is a type of data structure used in databases to organize and speed up data retrieval. It stores keys in a sorted order with all actual data pointers in the leaf nodes, allowing efficient range queries and fast lookups.How It Works
A B+ tree index works like a well-organized library catalog. Imagine a big book collection where you want to find a book quickly. Instead of searching every book, you use a sorted index that points you to the exact shelf and position.
In a B+ tree, internal nodes hold keys that guide the search path, but only the leaf nodes contain the actual data pointers. This means all data is stored at the bottom level, linked in order, making it easy to scan ranges of data efficiently.
When you search for a value, you start at the root and follow the keys down the tree until you reach the leaf node that contains the data or pointer you need. This structure keeps the tree balanced, so the search time stays fast even as data grows.
Example
This example shows a simple B+ tree structure with keys and pointers. The leaf nodes contain the actual data references, and internal nodes guide the search.
class BPlusTreeNode: def __init__(self, leaf=False): self.leaf = leaf self.keys = [] self.children = [] class BPlusTree: def __init__(self, order=3): self.root = BPlusTreeNode(leaf=True) self.order = order def insert(self, key): root = self.root if len(root.keys) == (self.order - 1): new_root = BPlusTreeNode() new_root.children.append(root) self._split_child(new_root, 0) self.root = new_root self._insert_non_full(self.root, key) def _split_child(self, parent, index): order = self.order node = parent.children[index] new_node = BPlusTreeNode(leaf=node.leaf) mid = order // 2 parent.keys.insert(index, node.keys[mid]) parent.children.insert(index + 1, new_node) new_node.keys = node.keys[mid + 1:] node.keys = node.keys[:mid] if not node.leaf: new_node.children = node.children[mid + 1:] node.children = node.children[:mid + 1] def _insert_non_full(self, node, key): if node.leaf: node.keys.append(key) node.keys.sort() else: i = len(node.keys) - 1 while i >= 0 and key < node.keys[i]: i -= 1 i += 1 child = node.children[i] if len(child.keys) == (self.order - 1): self._split_child(node, i) if key > node.keys[i]: i += 1 self._insert_non_full(node.children[i], key) def search(self, node, key): i = 0 while i < len(node.keys) and key > node.keys[i]: i += 1 if i < len(node.keys) and key == node.keys[i]: if node.leaf: return True else: return self.search(node.children[i + 1], key) elif node.leaf: return False else: return self.search(node.children[i], key) # Create B+ tree and insert keys bpt = BPlusTree(order=3) for k in [10, 20, 5, 6, 12, 30, 7, 17]: bpt.insert(k) # Search for keys print(bpt.search(bpt.root, 6)) # True print(bpt.search(bpt.root, 15)) # False
When to Use
Use a B+ tree index when you need fast data retrieval in databases, especially for range queries like finding all records between two values. It is ideal for large datasets because it keeps data balanced and minimizes disk reads.
Common real-world uses include indexing in relational databases, file systems, and any system that requires sorted data access with quick insertions, deletions, and lookups.
Key Points
- All actual data pointers are stored in leaf nodes, linked in order.
- Internal nodes only store keys to guide searches.
- Balanced tree structure ensures fast search, insert, and delete operations.
- Efficient for range queries and large datasets.
- Widely used in database indexing and file systems.