0
0
MongoDBquery~10 mins

Why sharding is needed in MongoDB - Visual Breakdown

Choose your learning style9 modes available
Concept Flow - Why sharding is needed
Start: Data grows large
Single server struggles
Performance drops
Need to split data
Implement sharding
Data split across servers
Improved performance & scalability
As data grows, one server can't handle it well, so we split data across servers using sharding to keep performance good.
Execution Sample
MongoDB
db.collection.insertMany([{_id:1,data:'A'}, {_id:2,data:'B'}, /* ... */])
// Data grows large
// Single server slows down
// Shard data across servers
// Queries go to correct shard
Shows data growing, single server slowing, then sharding data to improve query speed.
Execution Table
StepData SizeServer LoadActionResult
1SmallLowInsert dataFast insert and query
2MediumModerateInsert more dataSlightly slower queries
3LargeHighInsert more dataQueries slow, server overloaded
4LargeOverloadedDecide to shardPlan data split
5LargeDistributedSplit data across shardsLoad balanced, queries faster
6Very LargeDistributedContinue insertingSystem scales well, performance stable
💡 Sharding distributes data to avoid overload and keep performance stable as data grows
Variable Tracker
VariableStartAfter Step 2After Step 4After Step 6
Data SizeSmallMediumLargeVery Large
Server LoadLowModerateOverloadedDistributed
Query SpeedFastSlightly slowerSlowFast
Key Moments - 3 Insights
Why does the server load increase as data size grows?
Because more data means more work for the server to store and search, shown in execution_table steps 1 to 3.
Why can't we just keep adding data to one server?
One server has limits in memory, CPU, and disk speed, so performance drops as seen in step 3 where queries slow down.
How does sharding help improve performance?
Sharding splits data across servers, balancing load and making queries faster, as shown in steps 5 and 6.
Visual Quiz - 3 Questions
Test your understanding
Look at the execution_table, what is the server load at step 3?
AHigh
BModerate
CLow
DDistributed
💡 Hint
Check the 'Server Load' column in row for step 3 in execution_table
At which step does the system decide to shard the data?
AStep 2
BStep 4
CStep 3
DStep 6
💡 Hint
Look for the 'Action' column mentioning 'Decide to shard' in execution_table
If data size stayed small, what would happen to server load and query speed?
AServer load moderate, query speed slow
BServer load high, query speed slow
CServer load low, query speed fast
DServer load distributed, query speed fast
💡 Hint
Refer to step 1 in execution_table and variable_tracker for small data size
Concept Snapshot
Why Sharding is Needed:
- Data grows beyond single server capacity
- Server load increases, slowing queries
- Sharding splits data across servers
- Balances load and improves speed
- Enables system to scale with data size
Full Transcript
When data in a database grows large, a single server can struggle to handle all the data and queries. This causes the server load to increase and query speed to slow down. To fix this, we use sharding, which splits the data across multiple servers. This balances the load and keeps queries fast even as data grows. The execution table shows how data size and server load change step by step, and how sharding improves performance.