Question
Formatted question description: https://leetcode.ca/all/196.html
Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Id is the primary key column for this table.
For example, after running your query, the above Person table should have the following rows:
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
Algorithm
Delete the same email with a large Id.
Or, group according to the email, then use the Min
keyword to pick out the smaller ones, and then delete the complement set.
Possible Pitfalls
p1.Id > p2.Id
here is >
, not !=
, since
- we are keeping smaller-id of duplicates in result
- if
!=
, then all duplicates will be removed, but we are expecting smallest-id duplicate being kept
Code
SQL
DELETE FROM Person
WHERE Id NOT IN (
SELECT Id
FROM (SELECT MIN(Id) Id FROM Person GROUP BY Email) p
);
--
DELETE p1.*
FROM
Person p1,
Person p2
WHERE
p1.Email = p2.Email AND p1.Id > p2.Id