Introduction
When databases grow very large, it becomes hard to manage and access all the data quickly. Splitting data into smaller parts helps solve this problem by making data easier to handle and faster to reach.
Imagine a huge library with millions of books. Partitioning is like organizing books into sections on different shelves within the same room, such as fiction or history. Sharding is like having several library branches in different locations, each holding part of the collection.
┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Database │ │ Database │ │ Database │ │ Server 1 │ │ Server 2 │ │ Server 3 │ │ ┌───────────┐ │ │ ┌───────────┐ │ │ ┌───────────┐ │ │ │ Partition │ │ │ │ Partition │ │ │ │ Partition │ │ │ │ A │ │ │ │ B │ │ │ │ C │ │ │ └───────────┘ │ │ └───────────┘ │ │ └───────────┘ │ └───────────────┘ └───────────────┘ └───────────────┘ Partitioning: Multiple partitions inside one database server ┌───────────────┐ ┌───────────────┐ ┌───────────────┐ │ Shard 1 │ │ Shard 2 │ │ Shard 3 │ │ (Database 1) │ │ (Database 2) │ │ (Database 3) │ └───────────────┘ └───────────────┘ └───────────────┘ Sharding: Data split across multiple database servers
import sqlite3 # Create in-memory database conn = sqlite3.connect(':memory:') cur = conn.cursor() # Create a table cur.execute('CREATE TABLE users (id INTEGER, name TEXT, region TEXT)') # Insert sample data users = [ (1, 'Alice', 'North'), (2, 'Bob', 'South'), (3, 'Carol', 'North'), (4, 'Dave', 'East'), (5, 'Eve', 'South') ] cur.executemany('INSERT INTO users VALUES (?, ?, ?)', users) # Partitioning example: Query users by region (simulate partitions) cur.execute("SELECT * FROM users WHERE region = 'North'") north_users = cur.fetchall() # Sharding example: Simulate two shards as separate connections conn_shard1 = sqlite3.connect(':memory:') conn_shard2 = sqlite3.connect(':memory:') cur1 = conn_shard1.cursor() cur2 = conn_shard2.cursor() # Create tables on shards cur1.execute('CREATE TABLE users (id INTEGER, name TEXT)') cur2.execute('CREATE TABLE users (id INTEGER, name TEXT)') # Insert data into shards based on user id (shard key) cur1.executemany('INSERT INTO users VALUES (?, ?)', [(1, 'Alice'), (3, 'Carol'), (5, 'Eve')]) cur2.executemany('INSERT INTO users VALUES (?, ?)', [(2, 'Bob'), (4, 'Dave')]) # Query shard 1 cur1.execute('SELECT * FROM users') shard1_users = cur1.fetchall() # Query shard 2 cur2.execute('SELECT * FROM users') shard2_users = cur2.fetchall() print('Partition (North region):', north_users) print('Shard 1 users:', shard1_users) print('Shard 2 users:', shard2_users)