Welcome to Subscribe On Youtube
182. Duplicate Emails
Description
Table: Person
+-------------+---------+ | Column Name | Type | +-------------+---------+ | id | int | | email | varchar | +-------------+---------+ id is the primary key (column with unique values) for this table. Each row of this table contains an email. The emails will not contain uppercase letters.
Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input: Person table: +----+---------+ | id | email | +----+---------+ | 1 | a@b.com | | 2 | c@d.com | | 3 | a@b.com | +----+---------+ Output: +---------+ | Email | +---------+ | a@b.com | +---------+ Explanation: a@b.com is repeated two times.
Solutions
Solution 1: Group By + Having
We can use the GROUP BY
statement to group the data by the email
field, and then use the HAVING
statement to filter out the email
addresses that appear more than once.
Solution 2: Self-Join
We can use a self-join to join the Person
table with itself, and then filter out the records where the id
is different but the email
is the same.
-
import pandas as pd def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame: results = pd.DataFrame() results = person.loc[person.duplicated(subset=["email"]), ["email"]] return results.drop_duplicates()
-
# Write your MySQL query statement below SELECT Email FROM Person GROUP BY Email HAVING COUNT(*) > 1; -- SELECT DISTINCT p1.Email FROM Person p1 JOIN Person p2 ON p1.Email = p2.Email WHERE p1.Id <> p2.Id; -- SELECT email FROM Person GROUP BY 1 HAVING COUNT(1) > 1;