0
0
MongodbHow-ToBeginner · 2 min read

MongoDB Query to Find Duplicate Documents Easily

Use the MongoDB aggregation pipeline with $group to group documents by the fields you want to check duplicates on, then filter groups with $match having count > 1. For example: db.collection.aggregate([{ $group: { _id: { field1: "$field1", field2: "$field2" }, count: { $sum: 1 } } }, { $match: { count: { $gt: 1 } } }]).
📋

Examples

Input[{_id:1, name:'Alice', age:25}, {_id:2, name:'Bob', age:30}, {_id:3, name:'Alice', age:25}]
Output[{ _id: { name: 'Alice', age: 25 }, count: 2 }]
Input[{_id:1, email:'a@example.com'}, {_id:2, email:'b@example.com'}, {_id:3, email:'a@example.com'}, {_id:4, email:'c@example.com'}]
Output[{ _id: { email: 'a@example.com' }, count: 2 }]
Input[{_id:1, username:'user1'}, {_id:2, username:'user2'}]
Output[]
🧠

How to Think About It

To find duplicates, group documents by the fields that define duplication using $group. Then count how many documents fall into each group. Finally, filter groups where the count is more than one using $match to get only duplicates.
📐

Algorithm

1
Group documents by the fields to check duplicates on using $group.
2
Count the number of documents in each group.
3
Filter groups where count is greater than 1 to find duplicates.
4
Return the grouped fields and their counts.
💻

Code

mongodb
db.collection.aggregate([
  { $group: { _id: { field1: "$field1", field2: "$field2" }, count: { $sum: 1 } } },
  { $match: { count: { $gt: 1 } } }
])
Output
[ { "_id" : { "field1" : "value1", "field2" : "value2" }, "count" : 3 }, { "_id" : { "field1" : "value3", "field2" : "value4" }, "count" : 2 } ]
🔍

Dry Run

Let's trace a collection with documents having fields field1 and field2 to find duplicates.

1

Group documents

Group documents by field1 and field2, counting how many times each combination appears.

2

Filter duplicates

Keep only groups where count is greater than 1.

_idcount
{field1: 'value1', field2: 'value2'}3
{field1: 'value3', field2: 'value4'}2
{field1: 'value5', field2: 'value6'}1
💡

Why This Works

Step 1: Grouping documents

The $group stage groups documents by the specified fields, creating buckets for each unique combination.

Step 2: Counting documents

Inside each group, $sum: 1 counts how many documents belong to that group.

Step 3: Filtering duplicates

The $match stage filters groups to keep only those with a count greater than 1, which means duplicates exist.

🔄

Alternative Approaches

Using distinct and manual counting
mongodb
const values = db.collection.distinct('field1');
values.forEach(val => {
  const count = db.collection.countDocuments({ field1: val });
  if(count > 1) print(val + ' is duplicated');
});
This method is simpler but less efficient for large data because it queries multiple times.
Using Map-Reduce
mongodb
db.collection.mapReduce(
  function() { emit(this.field1, 1); },
  function(key, values) { return Array.sum(values); },
  { query: {}, out: { inline: 1 } }
);
Map-Reduce can find duplicates but is slower and more complex than aggregation.

Complexity: O(n) time, O(k) space

Time Complexity

The aggregation scans all documents once, so time is proportional to the number of documents n.

Space Complexity

Space depends on the number of unique groups k created by the grouping fields.

Which Approach is Fastest?

Aggregation is faster and more efficient than distinct with multiple queries or Map-Reduce for large datasets.

ApproachTimeSpaceBest For
Aggregation with $groupO(n)O(k)Large datasets, efficient duplicate detection
Distinct + countDocumentsO(n*m)O(1)Small datasets, simple scripts
Map-ReduceO(n)O(k)Complex processing, legacy support
💡
Always specify the fields to group by carefully to find meaningful duplicates.
⚠️
Beginners often forget to filter groups with count > 1, so they get all groups including unique ones.