Welcome to Subscribe On Youtube
3716. Find Churn Risk Customers
Description
Table: subscription_events
+++ \| event_id \| int \| \| user_id \| int \| \| event_date \| date \| \| event_type \| varchar \| \| plan_name \| varchar \| \| monthly_amount \| decimal \| ++-+--+-+--+-+--+-+--++-+--++-+--++-----+
Explanation:
- User 501:
- Currently active: Last event is downgrade to basic (not cancelled)
- Has downgrades: Yes, 2 downgrades in history
- Current revenue (9.99) vs max (29.99): 9.99/29.99 = 33.3% (less than 50%)
- Days as subscriber: Jan 1 to Mar 20 = 79 days (at least 60)
- Result: Churn Risk Customer
- User 502:
- Currently active: Last event is downgrade to basic (not cancelled)
- Has downgrades: Yes, 1 downgrade in history
- Current revenue (9.99) vs max (29.99): 9.99/29.99 = 33.3% (less than 50%)
- Days as subscriber: Jan 5 to Mar 15 = 70 days (at least 60)
- Result: Churn Risk Customer
- User 503:
- Currently active: Last event is upgrade to premium (not cancelled)
- Has downgrades: No downgrades in history
- Result: Not at-risk (no downgrade history)
- User 504:
- Currently active: Last event is cancel
- Result: Not at-risk (subscription cancelled)
- User 505:
- Currently active: Last event is 'upgrade' to standard (not cancelled)
- Has downgrades: No downgrades in history
- Result: Not at-risk (no downgrade history)
- User 506:
- Currently active: Last event is downgrade to basic (not cancelled)
- Has downgrades: Yes, 1 downgrade in history
- Current revenue (9.99) vs max (29.99): 9.99/29.99 = 33.3% (less than 50%)
- Days as subscriber: Jan 20 to Mar 10 = 50 days (less than 60)
- Result: Not at-risk (insufficient subscription duration)
Result table is ordered by days_as_subscriber DESC, then user_id ASC.
Note: days_as_subscriber is calculated from the first event date to the last event date for each user.
</div>
Solutions
Solution 1: Grouping Statistics + Join + Conditional Filtering
We first use a window function to get the last record for each user sorted by event date and event ID in descending order, obtaining the latest event information for each user. Then, we group and aggregate the subscription history information for each user, including the subscription start date, last event date, historical maximum subscription fee, and the number of downgrade events. Finally, we join the latest event information with the historical statistics and filter according to the conditions specified in the problem to get the list of customers at risk of churn.
-
import pandas as pd def find_churn_risk_customers(subscription_events: pd.DataFrame) -> pd.DataFrame: subscription_events["event_date"] = pd.to_datetime( subscription_events["event_date"] ) subscription_events = subscription_events.sort_values( ["user_id", "event_date", "event_id"] ) last_events = ( subscription_events.groupby("user_id") .tail(1)[["user_id", "event_type", "plan_name", "monthly_amount"]] .rename( columns={ "event_type": "last_event_type", "plan_name": "current_plan", "monthly_amount": "current_monthly_amount", } ) ) agg_df = ( subscription_events.groupby("user_id") .agg( start_date=("event_date", "min"), last_event_date=("event_date", "max"), max_historical_amount=("monthly_amount", "max"), downgrade_count=("event_type", lambda x: (x == "downgrade").sum()), ) .reset_index() ) merged = pd.merge(agg_df, last_events, on="user_id", how="inner") merged["days_as_subscriber"] = ( merged["last_event_date"] - merged["start_date"] ).dt.days result = merged[ (merged["last_event_type"] != "cancel") & (merged["downgrade_count"] >= 1) & (merged["current_monthly_amount"] < 0.5 * merged["max_historical_amount"]) & (merged["days_as_subscriber"] >= 60) ][ [ "user_id", "current_plan", "current_monthly_amount", "max_historical_amount", "days_as_subscriber", ] ] result = result.sort_values( ["days_as_subscriber", "user_id"], ascending=[False, True] ).reset_index(drop=True) return result -
WITH user_with_last_event AS ( SELECT s.*, ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY event_date DESC, event_id DESC ) AS rn FROM subscription_events s ), user_history AS ( SELECT user_id, MIN(event_date) AS start_date, MAX(event_date) AS last_event_date, MAX(monthly_amount) AS max_historical_amount, SUM( CASE WHEN event_type = 'downgrade' THEN 1 ELSE 0 END ) AS downgrade_count FROM subscription_events GROUP BY user_id ), latest_event AS ( SELECT user_id, event_type AS last_event_type, plan_name AS current_plan, monthly_amount AS current_monthly_amount FROM user_with_last_event WHERE rn = 1 ) SELECT l.user_id, l.current_plan, l.current_monthly_amount, h.max_historical_amount, DATEDIFF(h.last_event_date, h.start_date) AS days_as_subscriber FROM latest_event l JOIN user_history h ON l.user_id = h.user_id WHERE l.last_event_type <> 'cancel' AND h.downgrade_count >= 1 AND l.current_monthly_amount < 0.5 * h.max_historical_amount AND DATEDIFF(h.last_event_date, h.start_date) >= 60 ORDER BY days_as_subscriber DESC, l.user_id ASC;