Welcome to Subscribe On Youtube

3716. Find Churn Risk Customers

Description

Table: subscription_events

+++
\| event_id         \| int     \|
\| user_id          \| int     \|
\| event_date       \| date    \|
\| event_type       \| varchar \|
\| plan_name        \| varchar \|
\| monthly_amount   \| decimal \|
++-+--+-+--+-+--+-+--++-+--++-+--++-----+

Explanation:

  • User 501:
    • Currently active: Last event is downgrade to basic (not cancelled) 
    • Has downgrades: Yes, 2 downgrades in history 
    • Current revenue (9.99) vs max (29.99): 9.99/29.99 = 33.3% (less than 50%) 
    • Days as subscriber: Jan 1 to Mar 20 = 79 days (at least 60) 
    • Result: Churn Risk Customer
  • User 502:
    • Currently active: Last event is downgrade to basic (not cancelled) 
    • Has downgrades: Yes, 1 downgrade in history 
    • Current revenue (9.99) vs max (29.99): 9.99/29.99 = 33.3% (less than 50%) 
    • Days as subscriber: Jan 5 to Mar 15 = 70 days (at least 60) 
    • Result: Churn Risk Customer
  • User 503:
    • Currently active: Last event is upgrade to premium (not cancelled) 
    • Has downgrades: No downgrades in history 
    • Result: Not at-risk (no downgrade history)
  • User 504:
    • Currently active: Last event is cancel
    • Result: Not at-risk (subscription cancelled)
  • User 505:
    • Currently active: Last event is 'upgrade' to standard (not cancelled) 
    • Has downgrades: No downgrades in history 
    • Result: Not at-risk (no downgrade history)
  • User 506:
    • Currently active: Last event is downgrade to basic (not cancelled) 
    • Has downgrades: Yes, 1 downgrade in history 
    • Current revenue (9.99) vs max (29.99): 9.99/29.99 = 33.3% (less than 50%) 
    • Days as subscriber: Jan 20 to Mar 10 = 50 days (less than 60) 
    • Result: Not at-risk (insufficient subscription duration)

Result table is ordered by days_as_subscriber DESC, then user_id ASC.

Note: days_as_subscriber is calculated from the first event date to the last event date for each user.

</div>

Solutions

Solution 1: Grouping Statistics + Join + Conditional Filtering

We first use a window function to get the last record for each user sorted by event date and event ID in descending order, obtaining the latest event information for each user. Then, we group and aggregate the subscription history information for each user, including the subscription start date, last event date, historical maximum subscription fee, and the number of downgrade events. Finally, we join the latest event information with the historical statistics and filter according to the conditions specified in the problem to get the list of customers at risk of churn.

  • import pandas as pd
    
    
    def find_churn_risk_customers(subscription_events: pd.DataFrame) -> pd.DataFrame:
        subscription_events["event_date"] = pd.to_datetime(
            subscription_events["event_date"]
        )
        subscription_events = subscription_events.sort_values(
            ["user_id", "event_date", "event_id"]
        )
        last_events = (
            subscription_events.groupby("user_id")
            .tail(1)[["user_id", "event_type", "plan_name", "monthly_amount"]]
            .rename(
                columns={
                    "event_type": "last_event_type",
                    "plan_name": "current_plan",
                    "monthly_amount": "current_monthly_amount",
                }
            )
        )
    
        agg_df = (
            subscription_events.groupby("user_id")
            .agg(
                start_date=("event_date", "min"),
                last_event_date=("event_date", "max"),
                max_historical_amount=("monthly_amount", "max"),
                downgrade_count=("event_type", lambda x: (x == "downgrade").sum()),
            )
            .reset_index()
        )
    
        merged = pd.merge(agg_df, last_events, on="user_id", how="inner")
        merged["days_as_subscriber"] = (
            merged["last_event_date"] - merged["start_date"]
        ).dt.days
    
        result = merged[
            (merged["last_event_type"] != "cancel")
            & (merged["downgrade_count"] >= 1)
            & (merged["current_monthly_amount"] < 0.5 * merged["max_historical_amount"])
            & (merged["days_as_subscriber"] >= 60)
        ][
            [
                "user_id",
                "current_plan",
                "current_monthly_amount",
                "max_historical_amount",
                "days_as_subscriber",
            ]
        ]
    
        result = result.sort_values(
            ["days_as_subscriber", "user_id"], ascending=[False, True]
        ).reset_index(drop=True)
        return result
    
    
  • WITH
        user_with_last_event AS (
            SELECT
                s.*,
                ROW_NUMBER() OVER (
                    PARTITION BY user_id
                    ORDER BY event_date DESC, event_id DESC
                ) AS rn
            FROM subscription_events s
        ),
        user_history AS (
            SELECT
                user_id,
                MIN(event_date) AS start_date,
                MAX(event_date) AS last_event_date,
                MAX(monthly_amount) AS max_historical_amount,
                SUM(
                    CASE
                        WHEN event_type = 'downgrade' THEN 1
                        ELSE 0
                    END
                ) AS downgrade_count
            FROM subscription_events
            GROUP BY user_id
        ),
        latest_event AS (
            SELECT
                user_id,
                event_type AS last_event_type,
                plan_name AS current_plan,
                monthly_amount AS current_monthly_amount
            FROM user_with_last_event
            WHERE rn = 1
        )
    SELECT
        l.user_id,
        l.current_plan,
        l.current_monthly_amount,
        h.max_historical_amount,
        DATEDIFF(h.last_event_date, h.start_date) AS days_as_subscriber
    FROM
        latest_event l
        JOIN user_history h ON l.user_id = h.user_id
    WHERE
        l.last_event_type <> 'cancel'
        AND h.downgrade_count >= 1
        AND l.current_monthly_amount < 0.5 * h.max_historical_amount
        AND DATEDIFF(h.last_event_date, h.start_date) >= 60
    ORDER BY days_as_subscriber DESC, l.user_id ASC;
    
    

All Problems

All Solutions