Question

Formatted question description: https://leetcode.ca/all/196.html

Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+
Id is the primary key column for this table.
For example, after running your query, the above Person table should have the following rows:

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+

Algorithm

Delete the same email with a large Id.

Or, group according to the email, then use the Min keyword to pick out the smaller ones, and then delete the complement set.

Possible Pitfalls

p1.Id > p2.Id 

here is >, not !=, since

  1. we are keeping smaller-id of duplicates in result
  2. if !=, then all duplicates will be removed, but we are expecting smallest-id duplicate being kept

Code

SQL

DELETE FROM Person
WHERE Id NOT IN (
    SELECT Id
    FROM (SELECT MIN(Id) Id FROM Person GROUP BY Email) p
    );

--

DELETE p1.*
FROM
    Person p1,
    Person p2
WHERE
    p1.Email = p2.Email AND p1.Id > p2.Id

All Problems

All Solutions