0
0
MongodbConceptBeginner · 3 min read

Hashed Sharding in MongoDB: What It Is and How It Works

In MongoDB, hashed sharding is a method to distribute data across multiple servers by hashing the shard key's value. This evenly spreads data to avoid hotspots and balances load automatically without needing manual range management.
⚙️

How It Works

Hashed sharding works by taking the value of a chosen shard key and applying a hash function to it. This hash function converts the key into a seemingly random number. MongoDB then uses this number to decide which shard (server) will store the data.

Think of it like sorting mail by zip code, but instead of sorting by ranges of zip codes, you scramble the zip code numbers with a special formula. This scrambling spreads the mail evenly across many mailboxes, so no single mailbox gets overloaded.

This method helps MongoDB distribute data evenly across shards, preventing any one shard from becoming a bottleneck. It is especially useful when the shard key values are not naturally evenly distributed.

💻

Example

This example shows how to enable hashed sharding on a MongoDB collection using the sh.shardCollection command with a hashed shard key.

mongodb
use myDatabase
sh.enableSharding("myDatabase")
sh.shardCollection("myDatabase.myCollection", { userId: "hashed" })
Output
Successfully enabled sharding on database: myDatabase Shard collection myDatabase.myCollection on { userId: "hashed" }
🎯

When to Use

Use hashed sharding when your shard key has many unique values that are not naturally sequential or evenly distributed. It is ideal for keys like user IDs or random identifiers.

For example, if you have a user database and want to distribute users evenly across shards, hashing the user ID ensures no shard gets too many users. This avoids performance problems caused by uneven data distribution.

However, hashed sharding is not suitable if you need to query ranges of shard keys efficiently, because the hash scrambles the order.

Key Points

  • Hashed sharding uses a hash function on the shard key to distribute data evenly.
  • It prevents hotspots by balancing data across shards automatically.
  • Best for shard keys with many unique, non-sequential values.
  • Not suitable for range queries on the shard key.
  • Easy to set up with sh.shardCollection and the hashed option.

Key Takeaways

Hashed sharding evenly distributes data by hashing the shard key values.
It helps avoid uneven load and hotspots on shards.
Ideal for shard keys with many unique, random values like user IDs.
Not recommended if you need efficient range queries on the shard key.
Setup requires enabling sharding and specifying the shard key as hashed.