0
0
SQLquery~30 mins

Finding duplicates efficiently in SQL - Mini Project: Build & Apply

Choose your learning style9 modes available
Finding duplicates efficiently
📖 Scenario: You work in a small library database. The librarian suspects that some books have been entered twice by mistake. Your job is to find these duplicate book entries by their title and author.
🎯 Goal: Create a SQL query that finds duplicate books by title and author in the books table.
📋 What You'll Learn
Create a books table with columns id, title, and author.
Insert the given book entries into the books table.
Write a query to find duplicate books by title and author.
Show the title, author, and the count of duplicates as duplicate_count.
💡 Why This Matters
🌍 Real World
Finding duplicates in databases helps keep data clean and accurate, which is important in libraries, stores, and many other places.
💼 Career
Database administrators and data analysts often need to find and fix duplicate data to ensure reports and operations are correct.
Progress0 / 4 steps
1
Create the books table and insert data
Create a table called books with columns id (integer), title (text), and author (text). Then insert these exact rows: (1, 'The Great Gatsby', 'F. Scott Fitzgerald'), (2, '1984', 'George Orwell'), (3, 'The Great Gatsby', 'F. Scott Fitzgerald'), (4, 'To Kill a Mockingbird', 'Harper Lee'), (5, '1984', 'George Orwell').
SQL
Need a hint?

Use CREATE TABLE to make the table and INSERT INTO to add rows.

2
Add a variable for minimum duplicate count
Create a variable called min_duplicates and set it to 2. This will help us find entries that appear at least twice.
SQL
Need a hint?

Use \set min_duplicates 2 to define a variable in psql or your SQL environment.

3
Write the query to find duplicates
Write a SQL query that selects title, author, and counts how many times each pair appears as duplicate_count from the books table. Group by title and author. Use HAVING to only keep groups where the count is greater than or equal to min_duplicates.
SQL
Need a hint?

Use GROUP BY and HAVING COUNT(*) >= :min_duplicates to find duplicates.

4
Complete the query with ordering
Add an ORDER BY clause to the query to sort the results by duplicate_count in descending order, so the most duplicated books appear first.
SQL
Need a hint?

Use ORDER BY duplicate_count DESC to sort duplicates from most to least.