0
0
Data Structures Theoryknowledge~6 mins

B-trees for databases in Data Structures Theory - Full Explanation

Choose your learning style9 modes available
Introduction
Imagine trying to find a book in a huge library without a clear system. Searching would take forever. Databases face a similar problem when they need to quickly find, add, or remove data. B-trees solve this by organizing data so that searching and updating happen fast, even with large amounts of information.
Explanation
Structure of B-trees
A B-tree is a tree where each node can hold multiple keys and have several children. Unlike simple trees, nodes are designed to keep data balanced by having a minimum and maximum number of keys. This structure keeps the tree shallow, which means fewer steps to find data.
B-trees keep data balanced by allowing nodes to hold many keys, reducing the tree's height.
Searching in B-trees
To find a value, you start at the root and compare the target with keys in the node. Based on the comparison, you move to the correct child node. This process repeats until the value is found or confirmed missing. Because nodes hold many keys, each step covers a wide range, speeding up search.
Searching moves down the tree by comparing keys in nodes, making lookups efficient.
Insertion and splitting
When adding a new key, it is placed in the correct node in sorted order. If the node is full, it splits into two nodes, and the middle key moves up to the parent. This splitting keeps the tree balanced and prevents it from growing too tall.
Insertion may cause node splits that keep the tree balanced and shallow.
Deletion and merging
Removing a key can cause a node to have too few keys. To fix this, nodes borrow keys from siblings or merge with them. This process maintains the minimum number of keys per node and keeps the tree balanced.
Deletion uses borrowing or merging to maintain balance and minimum keys per node.
Why B-trees suit databases
Databases store data on disks where reading is slow. B-trees minimize disk reads by keeping nodes large and shallow, so fewer reads are needed. Their balanced nature ensures consistent performance for searching, inserting, and deleting data.
B-trees reduce disk reads and keep operations fast, making them ideal for databases.
Real World Analogy

Think of a large filing cabinet with drawers. Each drawer holds many folders sorted alphabetically. To find a folder, you open the right drawer and quickly scan the folders inside. If a drawer gets too full, it splits into two drawers, and the cabinet adjusts to keep things organized.

Structure of B-trees → Drawers holding many folders to keep things organized and easy to search
Searching in B-trees → Opening the right drawer and scanning folders to find a document
Insertion and splitting → Adding a folder to a drawer and splitting the drawer if it gets too full
Deletion and merging → Removing a folder and combining drawers if one becomes too empty
Why B-trees suit databases → Keeping the filing cabinet shallow and organized to find folders quickly without opening many drawers
Diagram
Diagram
          ┌─────────────┐
          │   Root Node  │
          │  [10, 20]   │
          └─────┬───────┘
                │
    ┌───────────┴───────────┐
    │                       │
┌─────────┐           ┌─────────┐
│ Node 1  │           │ Node 2  │
│ [5, 7]  │           │ [15,18] │
└─────────┘           └─────────┘
A simple B-tree with a root node holding two keys and two child nodes holding smaller key ranges.
Key Facts
B-tree nodeA unit in a B-tree that holds multiple keys and child pointers.
Node splittingThe process of dividing a full node into two and moving the middle key up.
Balanced treeA tree where all leaf nodes are at the same depth, ensuring efficient operations.
Disk read optimizationB-trees reduce the number of disk accesses by keeping nodes large and shallow.
Minimum degree (t)A parameter defining the minimum and maximum number of keys a B-tree node can hold.
Code Example
Data Structures Theory
class BTreeNode:
    def __init__(self, t, leaf=False):
        self.t = t  # Minimum degree
        self.keys = []
        self.children = []
        self.leaf = leaf

    def insert_non_full(self, key):
        i = len(self.keys) - 1
        if self.leaf:
            self.keys.append(None)
            while i >= 0 and key < self.keys[i]:
                self.keys[i + 1] = self.keys[i]
                i -= 1
            self.keys[i + 1] = key
        else:
            while i >= 0 and key < self.keys[i]:
                i -= 1
            i += 1
            if len(self.children[i].keys) == 2 * self.t - 1:
                self.split_child(i)
                if key > self.keys[i]:
                    i += 1
            self.children[i].insert_non_full(key)

    def split_child(self, i):
        t = self.t
        y = self.children[i]
        z = BTreeNode(t, y.leaf)
        z.keys = y.keys[t:]
        y.keys = y.keys[:t - 1]
        if not y.leaf:
            z.children = y.children[t:]
            y.children = y.children[:t]
        self.children.insert(i + 1, z)
        self.keys.insert(i, y.keys.pop())

class BTree:
    def __init__(self, t):
        self.root = BTreeNode(t, True)
        self.t = t

    def insert(self, key):
        root = self.root
        if len(root.keys) == 2 * self.t - 1:
            s = BTreeNode(self.t)
            s.children.insert(0, root)
            s.leaf = False
            s.split_child(0)
            i = 0
            if s.keys[0] < key:
                i += 1
            s.children[i].insert_non_full(key)
            self.root = s
        else:
            root.insert_non_full(key)

    def traverse(self, node=None):
        if node is None:
            node = self.root
        for i in range(len(node.keys)):
            if not node.leaf:
                self.traverse(node.children[i])
            print(node.keys[i], end=' ')
        if not node.leaf:
            self.traverse(node.children[len(node.keys)])

# Example usage
btree = BTree(2)
for key in [10, 20, 5, 6, 12, 30, 7, 17]:
    btree.insert(key)
btree.traverse()
OutputSuccess
Common Confusions
B-trees are the same as binary trees
B-trees are the same as binary trees B-trees differ because nodes can hold many keys and have multiple children, unlike binary trees which have at most two children per node.
Node splitting happens only at the root
Node splitting happens only at the root Node splitting can happen at any node that becomes full during insertion, not just the root.
B-trees are only for searching
B-trees are only for searching B-trees efficiently support searching, insertion, and deletion while keeping data balanced.
Summary
B-trees organize data in nodes with multiple keys to keep the tree balanced and shallow.
They support fast searching, insertion, and deletion by splitting and merging nodes as needed.
Their design reduces disk reads, making them ideal for database indexing and storage.