Welcome to Subscribe On Youtube
3586. Find COVID Recovery Patients
Description
Table: patients
+-++ \| patient_id \| int \| \| patient_name\| varchar \| \| age \| int \| +-++ \| Column Name \| Type \| +-++ test_id is the unique identifier for this table. Each row represents a COVID test result. The result can be Positive, Negative, or Inconclusive.
Write a solution to find patients who have recovered from COVID - patients who tested positive but later tested negative.
- A patient is considered recovered if they have at least one Positive test followed by at least one Negative test on a later date
- Calculate the recovery time in days as the difference between the first positive test and the first negative test after that positive test
- Only include patients who have both positive and negative test results
Return the result table ordered by recovery_time
in ascending order, then by patient_name
in ascending order.
The result format is in the following example.
Example:
Input:
patients table:
++--+--+ \| 1 \| Alice Smith \| 28 \| \| 2 \| Bob Johnson \| 35 \| \| 3 \| Carol Davis \| 42 \| \| 4 \| David Wilson \| 31 \| \| 5 \| Emma Brown \| 29 \| ++--++++--+ \| test_id \| patient_id \| test_date \| result \| ++++--+
Output:
++--++ \| patient_id \| patient_name \| age \| recovery_time \| ++--++ \| 1 \| Alice Smith \| 28 \| 10 \| \| 3 \| Carol Davis \| 42 \| 10 \| \| 2 \| Bob Johnson \| 35 \| 11 \| ++--++
Explanation:
- Alice Smith (patient_id = 1):
- First positive test: 2023-01-15
- First negative test after positive: 2023-01-25
- Recovery time: 25 - 15 = 10 days
- Bob Johnson (patient_id = 2):
- First positive test: 2023-02-01
- Inconclusive test on 2023-02-05 (ignored for recovery calculation)
- First negative test after positive: 2023-02-12
- Recovery time: 12 - 1 = 11 days
- Carol Davis (patient_id = 3):
- Had negative test on 2023-01-20 (before positive test)
- First positive test: 2023-02-10
- First negative test after positive: 2023-02-20
- Recovery time: 20 - 10 = 10 days
- Patients not included:
- David Wilson (patient_id = 4): Only has positive tests, no negative test after positive
- Emma Brown (patient_id = 5): Only has negative tests, never tested positive
Output table is ordered by recovery_time in ascending order, and then by patient_name in ascending order.
Solutions
Solution 1: Group Statistics + Equi-join
We can first find the date of the first positive test for each patient and record this in table first_positive. Next, we can find the date of the first negative test for each patient after their first positive test in the covid_tests table, and record this in table first_negative_after_positive. Finally, we join these two tables with the patients table, calculate the recovery time, and sort according to requirements.
-
import pandas as pd def find_covid_recovery_patients( patients: pd.DataFrame, covid_tests: pd.DataFrame ) -> pd.DataFrame: covid_tests["test_date"] = pd.to_datetime(covid_tests["test_date"]) pos = ( covid_tests[covid_tests["result"] == "Positive"] .groupby("patient_id", as_index=False)["test_date"] .min() ) pos.rename(columns={"test_date": "first_positive_date"}, inplace=True) neg = covid_tests.merge(pos, on="patient_id") neg = neg[ (neg["result"] == "Negative") & (neg["test_date"] > neg["first_positive_date"]) ] neg = neg.groupby("patient_id", as_index=False)["test_date"].min() neg.rename(columns={"test_date": "first_negative_date"}, inplace=True) df = pos.merge(neg, on="patient_id") df["recovery_time"] = ( df["first_negative_date"] - df["first_positive_date"] ).dt.days out = df.merge(patients, on="patient_id")[ ["patient_id", "patient_name", "age", "recovery_time"] ] return out.sort_values(by=["recovery_time", "patient_name"]).reset_index(drop=True)
-
# Write your MySQL query statement below WITH first_positive AS ( SELECT patient_id, MIN(test_date) AS first_positive_date FROM covid_tests WHERE result = 'Positive' GROUP BY patient_id ), first_negative_after_positive AS ( SELECT t.patient_id, MIN(t.test_date) AS first_negative_date FROM covid_tests t JOIN first_positive p ON t.patient_id = p.patient_id AND t.test_date > p.first_positive_date WHERE t.result = 'Negative' GROUP BY t.patient_id ) SELECT p.patient_id, p.patient_name, p.age, DATEDIFF(n.first_negative_date, f.first_positive_date) AS recovery_time FROM first_positive f JOIN first_negative_after_positive n ON f.patient_id = n.patient_id JOIN patients p ON p.patient_id = f.patient_id ORDER BY recovery_time ASC, patient_name ASC;