Welcome to Subscribe On Youtube

3657. Find Loyal Customers

Description

Table: customer_transactions

+++
\| transaction_id   \| int     \|
\| customer_id      \| int     \|
\| transaction_date \| date    \|
\| amount           \| decimal \|
\| transaction_type \| varchar \|
++-+-++--++
\| transaction_id \| customer_id \| transaction_date \| amount \| transaction_type \|
+-+-++--++

Output:

+-+
\| customer_id \|
+-+
\| 101         \|
\| 104         \|
+-+

Explanation:

  • Customer 101:
    • Purchase transactions: 4 (IDs: 1, 2, 3, 4) 
    • Refund transactions: 0
    • Refund rate: 0/4 = 0% (less than 20%) 
    • Active period: Jan 5 to Feb 20 = 46 days (at least 30 days) 
    • Qualifies as loyal 
  • Customer 102:
    • Purchase transactions: 3 (IDs: 5, 6, 9) 
    • Refund transactions: 2 (IDs: 7, 8)
    • Refund rate: 2/5 = 40% (exceeds 20%) 
    • Not loyal 
  • Customer 103:
    • Purchase transactions: 3 (IDs: 10, 11, 12) 
    • Refund transactions: 0
    • Refund rate: 0/3 = 0% (less than 20%) 
    • Active period: Jan 1 to Jan 3 = 2 days (less than 30 days) 
    • Not loyal 
  • Customer 104:
    • Purchase transactions: 5 (IDs: 13, 14, 15, 16, 17) 
    • Refund transactions: 1 (ID: 18)
    • Refund rate: 1/6 = 16.67% (less than 20%) 
    • Active period: Jan 1 to Mar 15 = 73 days (at least 30 days) 
    • Qualifies as loyal 

The result table is ordered by customer_id in ascending order.

</div>

Solutions

Solution 1

  • import pandas as pd
    
    
    def find_loyal_customers(customer_transactions: pd.DataFrame) -> pd.DataFrame:
        customer_transactions["transaction_date"] = pd.to_datetime(
            customer_transactions["transaction_date"]
        )
        grouped = customer_transactions.groupby("customer_id")
        agg_df = grouped.agg(
            total_transactions=("transaction_type", "size"),
            refund_count=("transaction_type", lambda x: (x == "refund").sum()),
            min_date=("transaction_date", "min"),
            max_date=("transaction_date", "max"),
        ).reset_index()
        agg_df["date_diff"] = (agg_df["max_date"] - agg_df["min_date"]).dt.days
        agg_df["refund_ratio"] = agg_df["refund_count"] / agg_df["total_transactions"]
        result = (
            agg_df[
                (agg_df["total_transactions"] >= 3)
                & (agg_df["refund_ratio"] < 0.2)
                & (agg_df["date_diff"] >= 30)
            ][["customer_id"]]
            .sort_values("customer_id")
            .reset_index(drop=True)
        )
        return result
    
    
  • # Write your MySQL query statement below
    SELECT customer_id
    FROM customer_transactions
    GROUP BY 1
    HAVING
        COUNT(1) >= 3
        AND SUM(transaction_type = 'refund') / COUNT(1) < 0.2
        AND DATEDIFF(MAX(transaction_date), MIN(transaction_date)) >= 30
    ORDER BY 1;
    
    

All Problems

All Solutions