How to Find Duplicate Values in a SQL Table | Tutorial by Chartio

The first query we’re going to write is a simple query to verify whether duplicates do indeed exist in the table. For our example, my query looks like this:

SELECT username, email, COUNT(*)
FROM users
GROUP BY username, email
HAVING COUNT(*) > 1

HAVING is important here because unlike WHEREHAVING filters on aggregate functions.

If any rows are returned, that means we have duplicates. In this example, our results look like this:

USERNAME EMAIL COUNT
Pete pete@example.com 2
Jessica jessica@example.com 2
Miles miles@example.com 2

In the previous step, our query returned a list of duplicates. Now, we want to return the entire record for each duplicate row.

To accomplish this, we’ll need to select the entire table and join that to our duplicate rows. Our query looks like this:

SELECT a.*
FROM users a
JOIN (SELECT username, email, COUNT(*)
FROM users 
GROUP BY username, email
HAVING count(*) > 1 ) b
ON a.username = b.username
AND a.email = b.email
ORDER BY a.email

If you look closely, you’ll see that this query is not so complicated. The initial SELECT simply selects every column in the users table, and then inner joins it with the duplicated data table from our initial query. Because we’re joining the table to itself, it’s necessary to use aliases (here, we’re using a and b) to label the two versions.

Here is what our results look like for this query:

ID USERNAME EMAIL
1 Pete pete@example.com
6 Pete pete@example.com
12 Jessica jessica@example.com
13 Jessica jessica@example.com
2 Miles miles@example.com
9 Miles miles@example.com

Because this result set includes all of the row ids, we can use it to help us deduplicate the rows later.

Source: How to Find Duplicate Values in a SQL Table | Tutorial by Chartio

How to Find Duplicate Values in a SQL Table | Tutorial by Chartio was last modified: December 28th, 2021 by Jovan Stosic

Leave a Reply