MongodbHow-ToBeginner · 2 min read

MongoDB Query to Find Duplicate Documents Easily

Use the MongoDB aggregation pipeline with $group to group documents by the fields you want to check duplicates on, then filter groups with $match having count > 1. For example:

db.collection.aggregate([{ $group: { _id: { field1: "$field1", field2: "$field2" }, count: { $sum: 1 } } }, { $match: { count: { $gt: 1 } } }])

📋

Examples

Input[{_id:1, name:'Alice', age:25}, {_id:2, name:'Bob', age:30}, {_id:3, name:'Alice', age:25}]

Output[{ _id: { name: 'Alice', age: 25 }, count: 2 }]

Input[{_id:1, email:'a@example.com'}, {_id:2, email:'b@example.com'}, {_id:3, email:'a@example.com'}, {_id:4, email:'c@example.com'}]

Output[{ _id: { email: 'a@example.com' }, count: 2 }]

Input[{_id:1, username:'user1'}, {_id:2, username:'user2'}]

Output[]

🧠

How to Think About It

To find duplicates, group documents by the fields that define duplication using $group. Then count how many documents fall into each group. Finally, filter groups where the count is more than one using $match to get only duplicates.

📐

Algorithm

Group documents by the fields to check duplicates on using $group.

Count the number of documents in each group.

Filter groups where count is greater than 1 to find duplicates.

Return the grouped fields and their counts.

💻

Code

mongodb

db.collection.aggregate([
  { $group: { _id: { field1: "$field1", field2: "$field2" }, count: { $sum: 1 } } },
  { $match: { count: { $gt: 1 } } }
])

Output

[ { "_id" : { "field1" : "value1", "field2" : "value2" }, "count" : 3 }, { "_id" : { "field1" : "value3", "field2" : "value4" }, "count" : 2 } ]

🔍

Dry Run

Let's trace a collection with documents having fields field1 and field2 to find duplicates.

Group documents

Group documents by field1 and field2, counting how many times each combination appears.

Filter duplicates

Keep only groups where count is greater than 1.

_id	count
{field1: 'value1', field2: 'value2'}	3
{field1: 'value3', field2: 'value4'}	2
{field1: 'value5', field2: 'value6'}	1

💡

Why This Works

Step 1: Grouping documents

The $group stage groups documents by the specified fields, creating buckets for each unique combination.

Step 2: Counting documents

Inside each group, $sum: 1 counts how many documents belong to that group.

Step 3: Filtering duplicates

The $match stage filters groups to keep only those with a count greater than 1, which means duplicates exist.

🔄

Alternative Approaches

Using distinct and manual counting

mongodb

const values = db.collection.distinct('field1');
values.forEach(val => {
  const count = db.collection.countDocuments({ field1: val });
  if(count > 1) print(val + ' is duplicated');
});

This method is simpler but less efficient for large data because it queries multiple times.

Using Map-Reduce

mongodb

db.collection.mapReduce(
  function() { emit(this.field1, 1); },
  function(key, values) { return Array.sum(values); },
  { query: {}, out: { inline: 1 } }
);

Map-Reduce can find duplicates but is slower and more complex than aggregation.

⚡

Complexity: O(n) time, O(k) space

Time Complexity

The aggregation scans all documents once, so time is proportional to the number of documents n.

Space Complexity

Space depends on the number of unique groups k created by the grouping fields.

Which Approach is Fastest?

Aggregation is faster and more efficient than distinct with multiple queries or Map-Reduce for large datasets.

Approach	Time	Space	Best For
Aggregation with $group	O(n)	O(k)	Large datasets, efficient duplicate detection
Distinct + countDocuments	O(n*m)	O(1)	Small datasets, simple scripts
Map-Reduce	O(n)	O(k)	Complex processing, legacy support

💡

Always specify the fields to group by carefully to find meaningful duplicates.

⚠️

Beginners often forget to filter groups with count > 1, so they get all groups including unique ones.