Sunday, December 7, 2008

SQL: Find duplicate rows in a table (with a primary key)

SQL: Find duplicate rows in a table (with a primary key)

Question:

I have a table of city names and need to find duplicate entries. The table does have a primary key called CITY_ID, so the duplicates will have different CITY_ID values but identical CITY_NAME values.

Answer:

If you indeed have a primary key then you need two cursor instances (c1, c2) as the following example shows. The query requires that ID #1 is smaller than ID #2 otherwise all pairs would be returned twice (2,3) and (3,2) or, if you don't even require that c1 <> c2, .. well, try that out for yourself.

If you do not have a primary key defined, see the other tip mentioned in the 'See Also' box.

Note:
In Microsoft SQL-Server, you can use the HAVING clause, as shown at the bottom.




// return all pairs of city IDs that have the same city name (Single)

select c1.city_id, c2.city_id, c1.city_name
from cities c1, cities c2
where c1.city_id < c2.city_id and c1.city_name = c2.city_name


// Another version (Single)

SELECT c1.city_id, c2.city_id, c1.city_name
FROM cities tb_member AS c1 INNER JOIN tb_member AS c2
ON c1.city_id < c2.city_id
AND c1.city_name = c2.city_name


// Another version (Multiple)

SELECT c1.city_id, c2.city_id, c1.city_name
FROM cities tb_member AS c1 INNER JOIN tb_member AS c2
ON (c1.city_id < c2.city_id OR c1.city_id > c2.city_id)
AND c1.city_name = c2.city_name


// Another version (Multiple) (good)

SELECT c1.city_id, c2.city_id, c1.city_name
FROM cities tb_member AS c1 INNER JOIN tb_member AS c2
ON c1.city_id <> c2.city_id
AND c1.city_name = c2.city_name


// version for Micrsoft's MSSQL Server
// make use of the HAVING clause

select city_name
from areas
group by city_name
having count(*) > 1

No comments: