0
0
DBMS Theoryknowledge~6 mins

Sharding and partitioning in DBMS Theory - Full Explanation

Choose your learning style9 modes available
Introduction
When databases grow very large, it becomes hard to manage and access all the data quickly. Splitting data into smaller parts helps solve this problem by making data easier to handle and faster to reach.
Explanation
Partitioning
Partitioning divides a single database table into smaller, manageable pieces called partitions. Each partition holds a subset of the data based on a rule, like ranges of values or specific categories. This helps improve query speed and maintenance by focusing on smaller data chunks.
Partitioning breaks one table into smaller parts within the same database to improve performance and management.
Sharding
Sharding splits a database horizontally by distributing data across multiple separate database servers or machines. Each shard holds a portion of the data, often based on a key like user ID. This spreads the load and allows the system to handle more data and traffic by working in parallel.
Sharding spreads data across multiple servers to scale out and handle large amounts of data and traffic.
Difference between Sharding and Partitioning
Partitioning usually happens within one database system and keeps all data on the same server, while sharding involves multiple servers or machines. Partitioning improves performance locally, whereas sharding improves scalability by adding more machines.
Partitioning is splitting data inside one database; sharding is splitting data across many databases on different servers.
Benefits of Sharding and Partitioning
Both methods help databases handle large data efficiently. Partitioning speeds up queries and simplifies maintenance. Sharding allows the system to grow by adding more servers, improving availability and fault tolerance.
Both techniques make large databases faster and easier to manage, but sharding also helps scale across servers.
Real World Analogy

Imagine a huge library with millions of books. Partitioning is like organizing books into sections on different shelves within the same room, such as fiction or history. Sharding is like having several library branches in different locations, each holding part of the collection.

Partitioning → Books organized into sections on shelves within one library room
Sharding → Multiple library branches in different locations, each with part of the books
Difference between Sharding and Partitioning → One room with shelves versus many separate library buildings
Benefits of Sharding and Partitioning → Easier to find books quickly and handle more visitors by spreading the collection
Diagram
Diagram
┌───────────────┐       ┌───────────────┐       ┌───────────────┐
│   Database    │       │   Database    │       │   Database    │
│   Server 1    │       │   Server 2    │       │   Server 3    │
│ ┌───────────┐ │       │ ┌───────────┐ │       │ ┌───────────┐ │
│ │ Partition │ │       │ │ Partition │ │       │ │ Partition │ │
│ │   A       │ │       │ │   B       │ │       │ │   C       │ │
│ └───────────┘ │       │ └───────────┘ │       │ └───────────┘ │
└───────────────┘       └───────────────┘       └───────────────┘

Partitioning: Multiple partitions inside one database server


┌───────────────┐   ┌───────────────┐   ┌───────────────┐
│   Shard 1     │   │   Shard 2     │   │   Shard 3     │
│ (Database 1) │   │ (Database 2) │   │ (Database 3) │
└───────────────┘   └───────────────┘   └───────────────┘

Sharding: Data split across multiple database servers
The diagram shows partitioning as multiple parts inside one database server and sharding as data spread across multiple database servers.
Key Facts
PartitioningDividing a database table into smaller parts within the same database server.
ShardingSplitting a database horizontally across multiple servers, each holding a subset of data.
Horizontal PartitioningAnother name for sharding, where rows are distributed across different databases.
Vertical PartitioningSplitting a table by columns into different tables, a form of partitioning.
ScalabilityThe ability of a system to handle increased load by adding resources like servers.
Code Example
DBMS Theory
import sqlite3

# Create in-memory database
conn = sqlite3.connect(':memory:')
cur = conn.cursor()

# Create a table
cur.execute('CREATE TABLE users (id INTEGER, name TEXT, region TEXT)')

# Insert sample data
users = [
    (1, 'Alice', 'North'),
    (2, 'Bob', 'South'),
    (3, 'Carol', 'North'),
    (4, 'Dave', 'East'),
    (5, 'Eve', 'South')
]
cur.executemany('INSERT INTO users VALUES (?, ?, ?)', users)

# Partitioning example: Query users by region (simulate partitions)
cur.execute("SELECT * FROM users WHERE region = 'North'")
north_users = cur.fetchall()

# Sharding example: Simulate two shards as separate connections
conn_shard1 = sqlite3.connect(':memory:')
conn_shard2 = sqlite3.connect(':memory:')
cur1 = conn_shard1.cursor()
cur2 = conn_shard2.cursor()

# Create tables on shards
cur1.execute('CREATE TABLE users (id INTEGER, name TEXT)')
cur2.execute('CREATE TABLE users (id INTEGER, name TEXT)')

# Insert data into shards based on user id (shard key)
cur1.executemany('INSERT INTO users VALUES (?, ?)', [(1, 'Alice'), (3, 'Carol'), (5, 'Eve')])
cur2.executemany('INSERT INTO users VALUES (?, ?)', [(2, 'Bob'), (4, 'Dave')])

# Query shard 1
cur1.execute('SELECT * FROM users')
shard1_users = cur1.fetchall()

# Query shard 2
cur2.execute('SELECT * FROM users')
shard2_users = cur2.fetchall()

print('Partition (North region):', north_users)
print('Shard 1 users:', shard1_users)
print('Shard 2 users:', shard2_users)
OutputSuccess
Common Confusions
Thinking partitioning and sharding are the same because both split data.
Thinking partitioning and sharding are the same because both split data. Partitioning splits data within one database server, while sharding splits data across multiple servers to scale out.
Believing sharding always improves speed for small databases.
Believing sharding always improves speed for small databases. Sharding helps large-scale systems; for small databases, it adds complexity without speed benefits.
Summary
Partitioning splits data inside one database into smaller parts to improve speed and management.
Sharding spreads data across multiple database servers to handle large scale and traffic.
Both techniques help databases work better but serve different purposes: partitioning for performance, sharding for scalability.