0
0
DBMS Theoryknowledge~10 mins

Sharding and partitioning in DBMS Theory - Step-by-Step Execution

Choose your learning style9 modes available
Concept Flow - Sharding and partitioning
Start: Large Database
Decide to split data
Partitioning: Split data within one server
Data divided into parts
Each part stored in same server
Sharding: Split data across servers
Data divided into shards
Each shard stored on different server
The flow shows how a large database is split either by partitioning within one server or by sharding across multiple servers.
Execution Sample
DBMS Theory
Database with 100 million records
Partition by 'Region' within one server
Shard by 'User ID' across 3 servers
This example splits data first by region inside one server (partitioning), then splits user data across three servers (sharding).
Analysis Table
StepActionData Split MethodResulting Data LocationNotes
1Start with full databaseNoneSingle large databaseAll data in one place
2Partition by RegionPartitioningData divided into regions inside one serverEach region is a partition
3Access data for Region APartitioningRead from Region A partitionFaster access to smaller data
4Shard by User ID across 3 serversShardingData split into 3 shards on different serversEach server holds a shard
5Access user with ID 123ShardingQuery directed to server holding shard for ID 123Reduces load on any one server
6Add new serverShardingRebalance shards across 4 serversData redistributed for balance
7EndNoneData split efficientlyImproved performance and scalability
💡 Data splitting stops when data is distributed either by partitioning or sharding for efficient access.
State Tracker
VariableStartAfter PartitioningAfter ShardingAfter Rebalancing
Data LocationSingle serverMultiple partitions in one serverMultiple servers with shardsShards redistributed across servers
Query TargetSingle databasePartition based on regionShard based on user IDShard based on new distribution
Key Insights - 3 Insights
What is the main difference between partitioning and sharding?
Partitioning splits data within the same server into parts, while sharding splits data across multiple servers. See execution_table steps 2 and 4.
Why does sharding improve performance compared to partitioning?
Sharding spreads data and load across servers, reducing the work each server does. Partitioning only divides data inside one server, so load is not distributed. See execution_table steps 4 and 5.
What happens when a new server is added in sharding?
Data shards are rebalanced and redistributed to include the new server, improving load balance. See execution_table step 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table at step 3. Where is the data accessed from when querying Region A?
AFrom a partition inside the same server
BFrom a shard on a different server
CFrom the entire database without splitting
DFrom a backup server
💡 Hint
Check the 'Resulting Data Location' column at step 3 in execution_table.
At which step does the data start being split across multiple servers?
AStep 2
BStep 3
CStep 4
DStep 6
💡 Hint
Look for when 'Data split into 3 shards on different servers' happens in execution_table.
If the database was only partitioned and not sharded, what would be true about the data location?
AData would be split across multiple servers
BData would remain in one server but divided into parts
CData would be duplicated on all servers
DData would be stored in the cloud only
💡 Hint
Refer to variable_tracker row 'Data Location' after Partitioning.
Concept Snapshot
Sharding and partitioning split large databases for better performance.
Partitioning divides data inside one server into parts.
Sharding splits data across multiple servers (shards).
Sharding improves scalability by distributing load.
Adding servers requires rebalancing shards.
Use partitioning for simpler splits, sharding for large scale.
Full Transcript
Sharding and partitioning are methods to split large databases to improve performance and scalability. Partitioning divides data into parts within the same server, such as by region, making access faster by focusing on smaller data sets. Sharding splits data across multiple servers, each holding a shard, which spreads the load and allows the system to handle more users or data. When a new server is added, shards are rebalanced to keep the load even. This visual trace shows the steps from a single large database to partitioning by region, then sharding by user ID across servers, and finally rebalancing shards when adding servers.