B Tree vs B+ Tree for Indexing: Key Differences and Usage
B Tree stores keys and data in all its nodes, allowing faster access but more complex traversal. A B+ Tree stores data only in leaf nodes and uses internal nodes for indexing, which improves range queries and sequential access efficiency.Quick Comparison
Here is a quick side-by-side comparison of B Tree and B+ Tree based on key factors relevant to indexing.
| Factor | B Tree | B+ Tree |
|---|---|---|
| Data Storage | Keys and data stored in all nodes | Data stored only in leaf nodes; internal nodes store keys only |
| Leaf Nodes | Leaf nodes are not linked | Leaf nodes are linked for sequential access |
| Search Efficiency | Direct access to data in any node | Data found only at leaf nodes, but faster range queries |
| Range Queries | Less efficient due to no leaf linkage | More efficient due to linked leaf nodes |
| Tree Height | Generally shorter due to data in all nodes | May be taller but balanced by efficient leaf traversal |
| Use Case | Good for general indexing | Preferred for databases with frequent range queries |
Key Differences
B Tree stores both keys and actual data in every node, including internal and leaf nodes. This means you can find data at any level, which can speed up single key lookups but makes the tree structure more complex.
In contrast, B+ Tree stores only keys in internal nodes and keeps all actual data in the leaf nodes. The leaf nodes are linked together in a linked list, which makes sequential and range queries much faster and simpler to implement.
Because B+ Tree separates keys and data, it often has a higher fan-out (more children per node), which can reduce tree height and improve disk read efficiency. This makes B+ Tree the preferred choice in many database indexing systems where range queries and ordered traversal are common.
Code Comparison
Below is a simple Python example showing how a B Tree node might store keys and data together and perform a search.
class BTreeNode: def __init__(self, keys=None, data=None, children=None, leaf=True): self.keys = keys or [] self.data = data or [] # Data stored with keys self.children = children or [] self.leaf = leaf def search(self, key): i = 0 while i < len(self.keys) and key > self.keys[i]: i += 1 if i < len(self.keys) and key == self.keys[i]: return self.data[i] # Data found at this node if self.leaf: return None return self.children[i].search(key) # Example usage root = BTreeNode(keys=[10, 20, 30], data=['A', 'B', 'C'], leaf=True) print(root.search(20))
B+ Tree Equivalent
This Python example shows a simplified B+ Tree leaf node storing data and internal nodes storing only keys. Leaf nodes are linked for sequential access.
class BPlusTreeNode: def __init__(self, keys=None, children=None, leaf=True, next_leaf=None, data=None): self.keys = keys or [] self.children = children or [] self.leaf = leaf self.next_leaf = next_leaf # Link to next leaf node self.data = data or [] # Data only in leaf nodes def search(self, key): if self.leaf: for i, k in enumerate(self.keys): if k == key: return self.data[i] return None else: i = 0 while i < len(self.keys) and key >= self.keys[i]: i += 1 return self.children[i].search(key) # Example usage leaf = BPlusTreeNode(keys=[10, 20, 30], data=['A', 'B', 'C'], leaf=True) print(leaf.search(20))
When to Use Which
Choose B Tree when: your application requires faster single key lookups and you want data accessible at any node level, especially when range queries are less frequent.
Choose B+ Tree when: your workload involves many range queries or ordered traversals, as linked leaf nodes make these operations more efficient and simpler to implement.
In modern database systems, B+ Tree is generally preferred for indexing due to its better support for range queries and disk-based storage optimization.
Key Takeaways
B+ Tree stores data only in leaf nodes, improving range query efficiency.B Tree stores data in all nodes, allowing faster single key access.B+ Tree leaf nodes are linked, enabling quick sequential access.B+ Tree for databases with frequent range queries.B Tree when direct data access at any node is needed.