Bird
Raised Fist0
MongoDBquery~15 mins

Custom _id values in MongoDB - Deep Dive

Choose your learning style10 modes available

Start learning this pattern below

Jump into concepts and practice - no test required

or
Recommended
Test this pattern10 questions across easy, medium, and hard to know if this pattern is strong
Overview - Custom _id values
What is it?
In MongoDB, every document has a unique identifier called _id. By default, MongoDB creates this _id automatically using an ObjectId, which is a special 12-byte value. However, you can choose to set your own custom _id values instead of using the default. This means you decide what uniquely identifies each document.
Why it matters
Custom _id values let you control how documents are identified and accessed. Without this, you rely on MongoDB's automatic IDs, which might not fit your data or application needs. For example, if you want to use a username or email as the unique key, custom _id lets you do that. Without it, you might need extra fields and indexes, making queries slower and more complex.
Where it fits
Before learning custom _id values, you should understand basic MongoDB documents and the default _id field. After this, you can explore indexing strategies, schema design, and data modeling to optimize your database performance and structure.
Mental Model
Core Idea
The _id field is the unique name tag for each document, and you can choose to write your own name instead of letting MongoDB assign one.
Think of it like...
Imagine a library where every book has a unique barcode. Normally, the library prints the barcode for you. But if you want, you can stick your own special barcode on a book to identify it exactly how you like.
┌───────────────┐
│   Document    │
│ ┌───────────┐ │
│ │   _id     │ │  <-- Unique identifier, default or custom
│ │  value    │ │
│ └───────────┘ │
│   other data  │
└───────────────┘
Build-Up - 6 Steps
1
FoundationUnderstanding the default _id field
🤔
Concept: MongoDB automatically adds a unique _id field to every document if you don't provide one.
When you insert a document without specifying _id, MongoDB creates an ObjectId for it. This ObjectId is a 12-byte value that ensures uniqueness across machines and time. It acts like a primary key in relational databases.
Result
Every document has a unique _id, which you can use to find or update that document quickly.
Knowing that MongoDB auto-generates _id helps you understand why documents are uniquely identified even if you don't set anything.
2
FoundationWhat is a custom _id value?
🤔
Concept: You can assign your own value to the _id field instead of letting MongoDB create one.
Instead of using the default ObjectId, you can set _id to a string, number, or any unique value before inserting a document. For example, you might use a username or email as _id to avoid duplicates and simplify lookups.
Result
Documents have _id values you control, which can match your application's unique keys.
Understanding that _id is just a field you can set lets you design your data model to fit your needs better.
3
IntermediateBenefits of using custom _id values
🤔Before reading on: Do you think using custom _id values can improve query speed or data integrity? Commit to your answer.
Concept: Custom _id values can improve query performance and enforce uniqueness on meaningful fields.
Since _id is indexed by default, using a meaningful custom _id lets you quickly find documents without creating extra indexes. It also prevents duplicate entries for that key because _id must be unique. For example, using email as _id ensures no two users share the same email.
Result
Faster queries on the unique key and automatic uniqueness enforcement.
Knowing that _id is always indexed explains why custom _id can optimize your database and simplify your schema.
4
IntermediateConstraints and data types for custom _id
🤔Before reading on: Can _id be any data type, or is it limited? Commit to your answer.
Concept: The _id field can be any BSON type, but it must be unique and immutable once set.
You can use strings, numbers, ObjectIds, or even embedded documents as _id. However, once a document is inserted, you cannot change its _id. Also, if you try to insert a document with an _id that already exists, MongoDB will reject it.
Result
You must choose stable, unique values for _id to avoid errors and maintain data integrity.
Understanding _id's immutability and uniqueness prevents common errors like duplicate key exceptions.
5
AdvancedHandling custom _id in application logic
🤔Before reading on: Should your application always generate _id, or can MongoDB do it? Commit to your answer.
Concept: When using custom _id, your application often needs to generate and manage these IDs carefully.
If you rely on custom _id, your app must ensure uniqueness before inserting documents. This might involve checking existing IDs or using a generation scheme like UUIDs. Also, you must handle errors when duplicates occur. Sometimes, you combine custom _id with other fields for complex keys.
Result
Your application controls document identity, requiring careful design to avoid conflicts.
Knowing that custom _id shifts responsibility to your app helps you design safer and more reliable data flows.
6
ExpertSurprising effects of custom _id on sharding and performance
🤔Before reading on: Do you think custom _id always improves performance? Commit to your answer.
Concept: Custom _id values affect how MongoDB distributes data in sharded clusters and can impact performance unexpectedly.
In sharded setups, the _id field is often the shard key or part of it. Using monotonically increasing ObjectIds helps distribute writes evenly. Custom _id values that are random or clustered can cause hotspots or unbalanced shards. Also, large or complex _id types increase index size and slow queries.
Result
Custom _id can improve or degrade performance depending on data distribution and size.
Understanding how _id interacts with sharding and indexing prevents costly performance mistakes in production.
Under the Hood
MongoDB stores the _id field as a unique index on the collection. When you insert a document, MongoDB checks if the _id exists. If not provided, it generates an ObjectId using timestamp, machine ID, process ID, and a counter to ensure uniqueness. Custom _id values bypass this generation but still must be unique. The unique index enforces this at the storage engine level, preventing duplicates and enabling fast lookups.
Why designed this way?
MongoDB designed _id to guarantee every document has a unique key for fast retrieval and data integrity. The default ObjectId balances uniqueness, size, and generation speed without coordination. Allowing custom _id gives flexibility for applications with existing unique keys or special requirements. This design avoids forcing a single ID scheme on all users.
┌───────────────┐       ┌───────────────┐
│ Insert Document│──────▶│ Check _id     │
│ with or without│       │ uniqueness in │
│ custom _id     │       │ unique index  │
└───────────────┘       └───────────────┘
          │                      │
          │                      │
          ▼                      ▼
┌─────────────────┐      ┌─────────────────┐
│ Generate ObjectId│      │ Reject if _id    │
│ if no custom _id │      │ duplicate exists │
└─────────────────┘      └─────────────────┘
          │                      │
          └──────────────┬───────┘
                         ▼
                ┌─────────────────┐
                │ Store Document   │
                │ with unique _id  │
                └─────────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Can you update the _id of an existing document? Commit to yes or no.
Common Belief:You can update the _id field of a document anytime like any other field.
Tap to reveal reality
Reality:The _id field is immutable after insertion; you cannot change it.
Why it matters:Trying to update _id causes errors and forces workarounds like deleting and reinserting documents, which can lead to data loss or inconsistency.
Quick: Does using a custom _id always make queries faster? Commit to yes or no.
Common Belief:Custom _id values always improve query speed because they are meaningful.
Tap to reveal reality
Reality:Custom _id can improve speed if chosen well, but poorly chosen or large _id values can slow down queries and increase storage.
Why it matters:Assuming custom _id is always better can lead to performance problems and wasted resources.
Quick: Is the _id field required to be an ObjectId? Commit to yes or no.
Common Belief:The _id field must always be an ObjectId type.
Tap to reveal reality
Reality:The _id field can be any BSON type, such as string, number, or document, as long as it is unique.
Why it matters:Believing _id must be ObjectId limits design choices and prevents using meaningful keys like usernames or emails.
Quick: Does MongoDB automatically generate _id even if you provide one? Commit to yes or no.
Common Belief:MongoDB always generates _id regardless of what you provide.
Tap to reveal reality
Reality:If you provide a custom _id, MongoDB uses it and does not generate a new one.
Why it matters:Not knowing this can cause duplicate key errors or confusion about which _id is stored.
Expert Zone
1
Custom _id values affect the size and performance of the _id index, so choosing compact types can save storage and speed up queries.
2
In sharded clusters, the choice of _id impacts data distribution; using monotonically increasing values helps avoid write hotspots.
3
Using embedded documents or arrays as _id is possible but can complicate queries and indexing, so it is rarely recommended.
When NOT to use
Avoid custom _id when your application does not have a natural unique key or when you want MongoDB to handle ID generation for simplicity. Instead, use the default ObjectId or create separate unique indexes on other fields.
Production Patterns
In production, custom _id is often used for natural keys like usernames, emails, or UUIDs to simplify lookups and enforce uniqueness. Some systems use composite keys encoded as strings for _id. Careful design ensures good shard key choices and avoids performance bottlenecks.
Connections
Primary Keys in Relational Databases
Custom _id in MongoDB serves the same role as primary keys in SQL databases.
Understanding primary keys helps grasp why _id must be unique and immutable, linking MongoDB concepts to familiar relational database design.
UUIDs (Universally Unique Identifiers)
UUIDs are often used as custom _id values to guarantee uniqueness across distributed systems.
Knowing UUIDs helps understand how to generate custom _id values that avoid collisions in large-scale applications.
Hash Tables in Computer Science
The unique _id index in MongoDB functions like a hash table for fast document lookup.
Recognizing this connection explains why _id lookups are very fast and why uniqueness is strictly enforced.
Common Pitfalls
#1Trying to insert two documents with the same custom _id.
Wrong approach:db.collection.insertMany([{_id: 'user1', name: 'Alice'}, {_id: 'user1', name: 'Bob'}])
Correct approach:db.collection.insertMany([{_id: 'user1', name: 'Alice'}, {_id: 'user2', name: 'Bob'}])
Root cause:Misunderstanding that _id must be unique causes duplicate key errors.
#2Updating the _id field of an existing document.
Wrong approach:db.collection.updateOne({_id: 'user1'}, {$set: {_id: 'user2'}})
Correct approach:db.collection.updateOne({_id: 'user1'}, {$set: {name: 'New Name'}})
Root cause:Believing _id is mutable leads to invalid update operations.
#3Using large or complex objects as _id without considering index size.
Wrong approach:db.collection.insertOne({_id: {email: 'a@b.com', domain: 'b.com'}, name: 'User'})
Correct approach:db.collection.insertOne({_id: 'a@b.com', name: 'User'})
Root cause:Not realizing that large _id values increase index size and slow queries.
Key Takeaways
The _id field uniquely identifies each MongoDB document and can be set by you or generated automatically.
Custom _id values give you control over document identity but require careful design to ensure uniqueness and performance.
The _id field is immutable and always indexed, making it critical for fast lookups and data integrity.
Choosing the right type and pattern for custom _id affects sharding, storage size, and query speed.
Understanding how _id works under the hood helps avoid common mistakes and optimize your MongoDB applications.

Practice

(1/5)
1.

What is the purpose of the _id field in a MongoDB document?

easy
A. It uniquely identifies each document in a collection.
B. It stores the creation date of the document.
C. It holds the user's login information.
D. It contains the document's size in bytes.

Solution

  1. Step 1: Understand the role of _id in MongoDB

    The _id field is a unique identifier for each document in a collection, ensuring no two documents share the same _id.
  2. Step 2: Compare with other options

    Other options describe unrelated fields or metadata, not the unique identifier role.
  3. Final Answer:

    It uniquely identifies each document in a collection. -> Option A
  4. Quick Check:

    _id = unique document ID [OK]
Hint: Remember: _id means unique ID for each document [OK]
Common Mistakes:
  • Thinking _id stores creation date
  • Confusing _id with user data fields
  • Assuming _id is optional
2.

Which of the following is the correct way to insert a document with a custom _id value in MongoDB?

db.users.insertOne({ _id: 123, name: "Alice" })
easy
A. db.users.insertOne({ id: 123, name: "Alice" })
B. db.users.insertOne({ _id: 123, name: "Alice" })
C. db.users.insertOne({ _id: "name", name: "Alice" })
D. db.users.insertOne({ _id: ObjectId(), name: "Alice" })

Solution

  1. Step 1: Identify correct _id field usage

    The _id field must be named exactly _id to set a custom ID. db.users.insertOne({ _id: 123, name: "Alice" }) uses _id: 123 correctly.
  2. Step 2: Check other options for errors

    db.users.insertOne({ id: 123, name: "Alice" }) uses id instead of _id. db.users.insertOne({ _id: "name", name: "Alice" }) uses a string "name" which is valid but less meaningful here. db.users.insertOne({ _id: ObjectId(), name: "Alice" }) uses default ObjectId, not custom.
  3. Final Answer:

    db.users.insertOne({ _id: 123, name: "Alice" }) -> Option B
  4. Quick Check:

    Custom _id needs exact field name [OK]
Hint: Use exact field name _id for custom IDs [OK]
Common Mistakes:
  • Using id instead of _id
  • Confusing ObjectId() with custom values
  • Using invalid types for _id
3.

Given the following documents inserted into a collection:

<pre>db.products.insertMany([ { _id: "p1", name: "Pen" }, { _id: "p2", name: "Pencil" }, { _id: "p3", name: "Eraser" } ]) What will db.products.find({ _id: "p2" }).toArray() return?

medium
A. [{ _id: "p2", name: "Pencil" }]
B. [{ _id: "p1", name: "Pen" }]
C. []
D. Error: Invalid query

Solution

  1. Step 1: Understand the query filter

    The query searches for a document with _id equal to "p2".
  2. Step 2: Match the document in the collection

    The document with _id: "p2" has the name "Pencil" and exists in the collection.
  3. Final Answer:

    [{ _id: "p2", name: "Pencil" }] -> Option A
  4. Quick Check:

    Query by custom _id returns matching document [OK]
Hint: Query by exact _id returns matching document [OK]
Common Mistakes:
  • Expecting multiple documents returned
  • Confusing _id with other fields
  • Assuming query returns error for string _id
4.

Consider this insertion attempt:

db.orders.insertOne({ _id: 101, item: "Book" })
db.orders.insertOne({ _id: 101, item: "Notebook" })

What error will occur and why?

medium
A. SyntaxError due to missing quotes around _id value.
B. TypeError because _id must be a string.
C. No error; both documents inserted successfully.
D. DuplicateKeyError because _id must be unique.

Solution

  1. Step 1: Check uniqueness requirement of _id

    The _id field must be unique in a collection. Both documents use _id: 101.
  2. Step 2: Identify the error caused by duplicate _id

    Inserting the second document with the same _id causes a DuplicateKeyError.
  3. Final Answer:

    DuplicateKeyError because _id must be unique. -> Option D
  4. Quick Check:

    Duplicate _id causes insertion error [OK]
Hint: No two documents can share the same _id [OK]
Common Mistakes:
  • Thinking _id can repeat
  • Confusing syntax error with duplicate key error
  • Assuming _id must be string only
5.

You want to store user profiles with custom _id values based on their email addresses to speed up lookups. Which approach is best?

// Option A
{ _id: ObjectId(), email: "user@example.com", name: "User" }

// Option B
{ _id: "user@example.com", name: "User" }

// Option C
{ email: "user@example.com", name: "User" }

// Option D
{ _id: UUID(), email: "user@example.com", name: "User" }
hard
A. Use default ObjectId and store email separately.
B. Do not use _id, just store email field.
C. Set _id to the email string for direct lookup.
D. Use UUID as _id and email separately.

Solution

  1. Step 1: Understand the goal of custom _id

    The goal is to speed up lookups by using email as the unique identifier.
  2. Step 2: Evaluate options for best fit

    Setting _id to the email string enables direct lookup using the email as _id, making queries fast. Using default ObjectId or UUID with separate email requires additional indexing. Omitting _id is invalid since it is mandatory.
  3. Final Answer:

    Set _id to the email string for direct lookup. -> Option C
  4. Quick Check:

    Custom _id as email speeds queries [OK]
Hint: Use email as _id for fast direct lookups [OK]
Common Mistakes:
  • Ignoring _id uniqueness requirement
  • Using default IDs when custom IDs help
  • Not indexing email for fast queries