0
0
MongoDBquery~15 mins

Query patterns that cause collection scans in MongoDB - Deep Dive

Choose your learning style9 modes available
Overview - Query patterns that cause collection scans
What is it?
In MongoDB, a collection scan happens when the database looks through every document in a collection to find matches for a query. This means MongoDB reads all documents one by one instead of using an index to quickly find the data. Collection scans are slower and use more resources, especially for large collections. Understanding which queries cause collection scans helps improve database speed.
Why it matters
Without knowing which queries cause collection scans, your database can become very slow and unresponsive as it wastes time checking every document. This can make apps lag or even crash under heavy use. By avoiding collection scans, you keep your database fast and efficient, making users happy and saving server costs.
Where it fits
Before learning this, you should understand basic MongoDB queries and indexes. After this, you can learn how to create and optimize indexes, and how to analyze query performance using tools like explain().
Mental Model
Core Idea
A collection scan is like searching every page in a book to find a word because there is no index or table of contents to guide you.
Think of it like...
Imagine you want to find a recipe in a cookbook but there is no index or table of contents. You have to flip through every page until you find the recipe. This is what a collection scan is like in a database.
┌───────────────────────────────┐
│          Collection           │
│  ┌───────────────┐            │
│  │ Document 1    │            │
│  │ Document 2    │  ← Scanned │
│  │ Document 3    │  ← One by  │
│  │ ...           │  ← One     │
│  │ Document N    │  ← One     │
│  └───────────────┘            │
└───────────────────────────────┘

No index → scan all documents
Build-Up - 7 Steps
1
FoundationWhat is a collection scan?
🤔
Concept: Introduce the basic idea of a collection scan in MongoDB.
A collection scan happens when MongoDB checks every document in a collection to find those that match a query. This means no index is used, so the database reads all documents one by one.
Result
MongoDB reads all documents in the collection for the query.
Understanding collection scans is key to knowing why some queries are slow and how indexes help speed them up.
2
FoundationHow indexes prevent collection scans
🤔
Concept: Explain how indexes help MongoDB find data faster without scanning all documents.
Indexes are like a map or table of contents that point directly to documents matching certain fields. When a query uses an indexed field, MongoDB can jump straight to matching documents instead of scanning everything.
Result
Queries using indexes run faster and avoid collection scans.
Knowing that indexes guide queries helps you understand why some queries avoid collection scans.
3
IntermediateQueries missing indexes cause scans
🤔Before reading on: do you think a query on a non-indexed field uses an index or scans the collection? Commit to your answer.
Concept: Queries on fields without indexes cause collection scans.
If you query a field that has no index, MongoDB has no shortcut and must scan every document to find matches. For example, querying { age: 30 } when 'age' is not indexed causes a collection scan.
Result
MongoDB performs a full collection scan for the query.
Recognizing that missing indexes cause scans helps you decide which fields to index.
4
IntermediateUsing operators that prevent index use
🤔Before reading on: do you think using $not or regex always uses indexes or sometimes causes scans? Commit to your answer.
Concept: Certain query operators can prevent MongoDB from using indexes, causing scans.
Operators like $not, $regex (without anchors), $where, or $exists can cause collection scans because they require checking each document. For example, a regex without a fixed start forces scanning all documents.
Result
MongoDB scans the collection despite indexes existing.
Knowing which operators cause scans helps you write queries that use indexes effectively.
5
IntermediateQueries on array fields and scans
🤔Before reading on: do you think querying inside arrays always uses indexes or sometimes causes scans? Commit to your answer.
Concept: Queries on array fields can cause scans if indexes are missing or not suitable.
If you query for elements inside arrays without proper indexes like multikey indexes, MongoDB may scan the collection. For example, searching for a value inside an array field without an index causes a scan.
Result
Collection scan happens for array element queries without indexes.
Understanding array indexing helps avoid scans on complex data types.
6
AdvancedImpact of query shape on index use
🤔Before reading on: do you think changing query order or adding fields always uses indexes or can cause scans? Commit to your answer.
Concept: The structure and order of query fields affect whether indexes are used or scans happen.
MongoDB uses indexes based on the query shape matching the index key pattern. If the query fields are out of order or include fields not in the index, MongoDB may do a collection scan. For example, a compound index on {a:1, b:1} won't help a query only on b.
Result
Query may cause collection scan if index does not match query shape.
Knowing how query shape affects index use helps design better indexes and queries.
7
ExpertHidden scans from covered queries and projections
🤔Before reading on: do you think projecting fields not in the index causes scans or not? Commit to your answer.
Concept: Even queries using indexes can cause collection scans if projections require fetching documents.
If a query uses an index but requests fields not in the index (not covered), MongoDB fetches documents from the collection, causing a fetch stage that can behave like a scan. This hidden scan can slow queries unexpectedly.
Result
Query uses index but still reads documents, causing extra work.
Understanding covered queries and projections prevents hidden collection scans and improves performance.
Under the Hood
When MongoDB receives a query, it checks if an index can be used by matching the query fields to index keys. If no suitable index exists, MongoDB performs a collection scan by reading every document sequentially. Collection scans read all documents from disk or memory, which is slow. Indexes store pointers to documents sorted by key values, allowing quick lookups. Some query operators or shapes prevent index use, forcing scans. Even with indexes, if requested fields are not in the index, MongoDB fetches full documents, adding overhead.
Why designed this way?
MongoDB was designed to be flexible and support many query types. Indexes speed up common queries but cannot cover every possible query shape or operator. Collection scans act as a fallback to ensure queries always return correct results, even if slow. This design balances speed and flexibility. Alternatives like forcing index use or rejecting queries would limit MongoDB's usability.
┌───────────────┐       ┌───────────────┐
│   Query       │──────▶│ Index Check   │
└───────────────┘       └───────────────┘
                              │
               ┌──────────────┴──────────────┐
               │                             │
       ┌───────────────┐             ┌───────────────┐
       │ Index Found   │             │ No Index      │
       └───────────────┘             └───────────────┘
               │                             │
       ┌───────────────┐             ┌───────────────┐
       │ Use Index     │             │ Collection    │
       │ Lookup       │             │ Scan (Full)   │
       └───────────────┘             └───────────────┘
               │                             │
       ┌───────────────┐             ┌───────────────┐
       │ Fetch Docs if │             │ Read All Docs │
       │ Needed       │             │ Sequentially  │
       └───────────────┘             └───────────────┘
Myth Busters - 4 Common Misconceptions
Quick: Does adding an index on any field guarantee no collection scans? Commit yes or no.
Common Belief:Adding an index on a field always prevents collection scans for queries on that field.
Tap to reveal reality
Reality:Indexes only prevent collection scans if the query uses the index correctly and the query shape matches the index. Some operators or query patterns still cause scans.
Why it matters:Assuming indexes always prevent scans can lead to unexpected slow queries and wasted effort on indexing.
Quick: Do you think $regex queries always use indexes? Commit yes or no.
Common Belief:Regex queries always use indexes if the field is indexed.
Tap to reveal reality
Reality:Regex queries only use indexes if they have a fixed prefix (anchored). Otherwise, MongoDB does a collection scan.
Why it matters:Misusing regex can cause slow queries and high resource use.
Quick: Does projecting fewer fields always speed up queries and avoid scans? Commit yes or no.
Common Belief:Projecting fewer fields always avoids collection scans.
Tap to reveal reality
Reality:If the projected fields are not in the index, MongoDB still fetches full documents, causing extra reads similar to scans.
Why it matters:Incorrect projection assumptions can hide performance problems.
Quick: Do you think queries on array fields always use indexes? Commit yes or no.
Common Belief:Queries on array fields always use indexes if the array field is indexed.
Tap to reveal reality
Reality:Without proper multikey indexes or if the query shape is complex, MongoDB may still scan the collection.
Why it matters:Overlooking array indexing leads to slow queries on common data types.
Expert Zone
1
MongoDB's query planner may choose a collection scan if it estimates it to be faster than using an index, especially for very small collections or highly selective queries.
2
Compound indexes require queries to use the prefix fields in order to avoid scans; missing the leading fields in queries causes scans even if later fields are indexed.
3
The presence of sparse or partial indexes affects whether queries use indexes or fall back to scans, depending on query predicates.
When NOT to use
Avoid relying on collection scans for large collections or production workloads. Instead, create appropriate indexes, rewrite queries to use index-friendly operators, or use aggregation pipelines with indexes. For complex queries, consider denormalization or caching to reduce scan needs.
Production Patterns
In production, developers monitor query plans with explain() to detect scans. They create compound and multikey indexes tailored to query patterns. They avoid operators like $where or unanchored regex in hot paths. Covered queries are used to minimize document fetches. Query shape and index design are iteratively refined based on real usage.
Connections
Indexing
Builds-on
Understanding collection scans clarifies why indexing is crucial for database performance and how indexes guide queries.
Algorithmic Search
Same pattern
Collection scans are like linear search algorithms, while indexes are like binary search trees, showing a fundamental computer science tradeoff between speed and data structure.
Library Cataloging Systems
Analogy in a different field
Just as libraries use catalogs to avoid scanning every book, databases use indexes to avoid scanning every document, showing how organizing information efficiently is a universal challenge.
Common Pitfalls
#1Querying a non-indexed field expecting fast results
Wrong approach:db.users.find({ lastName: 'Smith' })
Correct approach:db.users.createIndex({ lastName: 1 }) db.users.find({ lastName: 'Smith' })
Root cause:Not creating an index on the queried field causes MongoDB to scan the entire collection.
#2Using unanchored regex causing scans
Wrong approach:db.products.find({ name: { $regex: 'phone' } })
Correct approach:db.products.find({ name: { $regex: '^phone' } })
Root cause:Regex without a fixed start cannot use indexes, forcing a collection scan.
#3Query shape mismatch with compound index
Wrong approach:db.orders.find({ status: 'shipped' }) // compound index on { customerId:1, status:1 }
Correct approach:db.orders.find({ customerId: 123, status: 'shipped' })
Root cause:Queries must use the leading fields of a compound index to avoid scans.
Key Takeaways
Collection scans happen when MongoDB cannot use an index and must check every document.
Indexes act like shortcuts that let MongoDB find data quickly without scanning everything.
Query patterns, operators, and shapes affect whether indexes are used or scans happen.
Avoiding collection scans improves database speed and resource use, especially on large collections.
Understanding collection scans helps you design better indexes and write efficient queries.