0
0
MongoDBquery~15 mins

How MongoDB stores data as documents - Mechanics & Internals

Choose your learning style9 modes available
Overview - How MongoDB stores data as documents
What is it?
MongoDB stores data in a format called documents, which are like flexible records that hold information in key-value pairs. Each document is similar to a JSON object, making it easy to represent complex data with nested structures. These documents are grouped into collections, which are like tables in traditional databases but without fixed schemas. This approach allows MongoDB to handle varied and changing data easily.
Why it matters
This document-based storage solves the problem of rigid data structures in traditional databases, letting developers store and retrieve data that changes shape over time without hassle. Without this, applications would struggle to adapt to new data needs quickly, slowing down development and making data harder to manage. It also makes working with data more natural for many modern apps, like those handling user profiles, product catalogs, or logs.
Where it fits
Before learning this, you should understand basic database concepts like tables and rows in relational databases. After this, you can explore how MongoDB queries and indexes work to efficiently find and organize these documents. Later, you might learn about data modeling strategies specific to document databases and how to scale MongoDB for large applications.
Mental Model
Core Idea
MongoDB stores data as flexible, self-contained documents that hold all related information together in a format similar to JSON.
Think of it like...
Imagine a filing cabinet where each folder holds a complete set of papers about one topic, like a person's file with their name, address, and notes all inside. Each folder is independent and can have different types of papers, unlike a spreadsheet where every row must have the same columns.
┌───────────────┐
│ Collection    │
│ ┌───────────┐ │
│ │ Document 1│ │
│ │ {         │ │
│ │  "name": "Alice", │
│ │  "age": 30,       │ │
│ │  "hobbies": ["reading", "hiking"] │
│ │ }         │ │
│ └───────────┘ │
│ ┌───────────┐ │
│ │ Document 2│ │
│ │ {         │ │
│ │  "name": "Bob",   │
│ │  "age": 25,       │ │
│ │  "pets": {"dog": "Rex"} │
│ │ }         │ │
│ └───────────┘ │
└───────────────┘
Build-Up - 7 Steps
1
FoundationUnderstanding Documents as Data Units
🤔
Concept: Documents are the basic units of data storage in MongoDB, similar to records but more flexible.
In MongoDB, data is stored as documents. Each document is a set of key-value pairs, where keys are strings and values can be many types like strings, numbers, arrays, or even other documents. This structure allows storing complex data in one place without splitting it across multiple tables.
Result
You can store a person's information, including their name, age, and hobbies, all inside one document.
Understanding that documents hold all related data together helps you see why MongoDB is great for flexible and evolving data.
2
FoundationCollections Group Documents Together
🤔
Concept: Documents are grouped into collections, which organize similar documents without enforcing a fixed structure.
A collection in MongoDB is like a folder that holds many documents. Unlike tables in relational databases, collections do not require all documents to have the same fields. This means one document can have a 'hobbies' field while another might have a 'pets' field, and both live happily in the same collection.
Result
You can store different types of documents in one collection without changing the database schema.
Knowing collections are flexible containers helps you design databases that adapt as your data changes.
3
IntermediateDocuments Use BSON Format Internally
🤔Before reading on: do you think MongoDB stores documents exactly as JSON text or in a different format? Commit to your answer.
Concept: MongoDB stores documents in BSON, a binary form of JSON that supports more data types and is efficient for storage and speed.
While documents look like JSON, MongoDB actually uses BSON (Binary JSON) to store them. BSON supports extra data types like dates and binary data, and it is faster for the database to read and write. This means MongoDB can handle more complex data and do it quickly.
Result
Documents can include data types like dates and binary files, which JSON alone can't represent well.
Understanding BSON explains how MongoDB balances flexibility with performance and supports rich data types.
4
IntermediateNested Documents and Arrays for Complex Data
🤔Before reading on: do you think MongoDB documents can only store simple key-value pairs or also nested structures? Commit to your answer.
Concept: Documents can contain other documents and arrays, allowing complex and hierarchical data to be stored naturally.
MongoDB documents can have fields that are themselves documents or arrays. For example, a 'pets' field can be a document with pet names and types, or a 'hobbies' field can be an array of strings. This nesting lets you model real-world data more closely than flat tables.
Result
You can represent a user with multiple addresses or a product with a list of features inside one document.
Knowing documents can nest data helps you design schemas that reflect real-world relationships without joins.
5
IntermediateDocuments Have Unique _id Fields
🤔
Concept: Each document has a unique identifier called _id that MongoDB uses to find and manage documents efficiently.
MongoDB automatically adds an '_id' field to every document if you don't provide one. This field uniquely identifies the document within its collection. It can be any unique value, but by default, MongoDB uses an ObjectId, a special 12-byte value that encodes time and uniqueness.
Result
Every document can be quickly found or updated using its _id without scanning the whole collection.
Understanding the _id field is key to efficient data retrieval and ensures each document is uniquely identifiable.
6
AdvancedStorage Engine Manages Document Persistence
🤔Before reading on: do you think MongoDB stores documents as-is on disk or uses a special storage method? Commit to your answer.
Concept: MongoDB uses a storage engine that manages how documents are saved on disk, balancing speed and durability.
Behind the scenes, MongoDB's storage engine (like WiredTiger) compresses and organizes documents on disk. It writes data in blocks and uses indexes to speed up queries. The engine also handles concurrency so many users can read and write documents safely at the same time.
Result
Documents are stored efficiently and can be retrieved quickly even in large databases.
Knowing about the storage engine reveals how MongoDB achieves fast, reliable document storage beyond just the document format.
7
ExpertDocument Size Limits and Impact on Design
🤔Before reading on: do you think MongoDB documents can be infinitely large or have size limits? Commit to your answer.
Concept: MongoDB limits document size to 16MB, which influences how you design your data and when to split data across documents.
Each MongoDB document can be up to 16 megabytes in size. This limit means very large data, like big images or logs, should be stored differently, such as using GridFS or splitting data into multiple documents. Understanding this limit helps avoid errors and performance issues.
Result
You design your data model to keep documents within size limits, ensuring smooth database operation.
Knowing document size limits guides practical schema design and prevents common pitfalls in production.
Under the Hood
MongoDB converts documents into BSON format, which encodes data types and structure in a compact binary form. The storage engine writes these BSON documents to disk in data files, organizing them in blocks for efficient access. Indexes map keys to document locations, speeding up queries. When you insert or update a document, the engine manages memory, concurrency, and durability to keep data consistent and fast to retrieve.
Why designed this way?
MongoDB was designed to handle flexible, evolving data without the rigid schemas of relational databases. BSON was chosen over plain JSON to support richer data types and faster processing. The storage engine architecture balances speed, concurrency, and durability, enabling MongoDB to serve modern applications that need quick, scalable access to complex data.
┌───────────────┐
│ Application   │
│  sends JSON   │
└──────┬────────┘
       │ converts
       ▼
┌───────────────┐
│ MongoDB Driver│
│ converts JSON │
│ to BSON       │
└──────┬────────┘
       │ stores
       ▼
┌───────────────┐
│ Storage Engine│
│ writes BSON   │
│ to disk files │
└──────┬────────┘
       │ indexes
       ▼
┌───────────────┐
│ Indexes       │
│ map keys to   │
│ document locs │
└───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Do you think MongoDB documents must all have the same fields? Commit to yes or no.
Common Belief:All documents in a MongoDB collection must have the same fields like columns in a table.
Tap to reveal reality
Reality:Documents in the same collection can have completely different fields and structures.
Why it matters:Believing this limits your schema design and prevents you from using MongoDB's flexibility, leading to unnecessary complexity.
Quick: Do you think MongoDB stores documents as plain JSON text on disk? Commit to yes or no.
Common Belief:MongoDB stores documents exactly as JSON text files on disk.
Tap to reveal reality
Reality:MongoDB stores documents in BSON, a binary format optimized for speed and additional data types.
Why it matters:Assuming JSON storage can cause confusion about supported data types and performance characteristics.
Quick: Do you think MongoDB documents can be any size without limits? Commit to yes or no.
Common Belief:MongoDB documents can be infinitely large without any size restrictions.
Tap to reveal reality
Reality:Documents have a maximum size of 16MB, requiring careful design for large data.
Why it matters:Ignoring size limits can cause errors and performance problems in production.
Quick: Do you think MongoDB collections enforce data validation by default? Commit to yes or no.
Common Belief:MongoDB collections always enforce strict data validation like relational tables.
Tap to reveal reality
Reality:By default, MongoDB collections do not enforce schemas or validation unless explicitly configured.
Why it matters:Assuming validation exists can lead to inconsistent data and bugs if not managed properly.
Expert Zone
1
MongoDB's BSON format includes a special ObjectId type that encodes creation time, helping with sorting and sharding decisions.
2
The storage engine uses compression and memory-mapped files to optimize disk usage and speed, which can affect performance tuning.
3
Nested documents and arrays can impact index design and query performance, requiring careful schema planning.
When NOT to use
MongoDB document storage is not ideal when strict relational integrity or complex multi-table transactions are required; in such cases, relational databases like PostgreSQL are better. Also, for extremely large binary data, specialized storage like GridFS or external file storage is preferred.
Production Patterns
In production, MongoDB documents are often designed to embed related data to reduce joins, use indexes on frequently queried fields, and leverage schema validation rules for data quality. Sharding distributes collections across servers by _id or other keys to scale horizontally.
Connections
JSON Data Format
MongoDB documents are based on JSON-like structures but extend them with BSON.
Understanding JSON helps grasp MongoDB documents, but knowing BSON explains MongoDB's richer data support and performance.
Relational Database Tables
Collections and documents serve a similar role to tables and rows but with more flexibility.
Comparing MongoDB documents to table rows clarifies the tradeoffs between schema rigidity and flexibility.
File System Organization
MongoDB collections and documents resemble folders and files in a file system hierarchy.
Seeing documents as files in folders helps understand how MongoDB organizes and accesses data efficiently.
Common Pitfalls
#1Trying to store very large data directly inside a single document.
Wrong approach:{ "_id": 1, "image": }
Correct approach:Use GridFS or store large files outside MongoDB and reference them by URL or ID.
Root cause:Misunderstanding the 16MB document size limit and how to handle large data.
#2Assuming all documents in a collection must have the same fields and trying to enforce this manually.
Wrong approach:Inserting documents with different fields and expecting errors or schema enforcement.
Correct approach:Design collections to allow flexible fields or use schema validation rules if strictness is needed.
Root cause:Confusing MongoDB's flexible schema with relational database schemas.
#3Using JSON strings inside documents to store nested data instead of native nested documents.
Wrong approach:{ "name": "Alice", "address": "{\"city\":\"NY\"}" }
Correct approach:{ "name": "Alice", "address": { "city": "NY" } }
Root cause:Not leveraging MongoDB's native nested document support, leading to harder queries and updates.
Key Takeaways
MongoDB stores data as flexible documents that group related information together in a JSON-like format.
Documents are stored in collections without fixed schemas, allowing varied and evolving data structures.
Internally, MongoDB uses BSON, a binary format that supports rich data types and efficient storage.
Each document has a unique _id field that helps MongoDB quickly find and manage data.
Understanding document size limits and storage engine behavior is crucial for designing efficient, scalable MongoDB applications.