Welcome to Subscribe On Youtube
3497. Analyze Subscription Conversion
Description
Table: UserActivity
+++ \| user_id \| int \| \| activity_date \| date \| \| activity_type \| varchar \| \| activity_duration\| int \| ++++-+ \| user_id \| activity_date \| activity_type \| activity_duration \| ++++-+
Output:
++--+-+ \| 1 \| 45.00 \| 76.67 \| \| 3 \| 70.00 \| 63.33 \| \| 4 \| 37.50 \| 45.00 \| +----+
Explanation:
- User 1:
- Had 3 days of free trial with durations of 45, 30, and 60 minutes.
- Average trial duration: (45 + 30 + 60) / 3 = 45.00 minutes.
- Had 3 days of paid subscription with durations of 75, 90, and 65 minutes.
- Average paid duration: (75 + 90 + 65) / 3 = 76.67 minutes.
- User 2:
- Had 3 days of free trial with durations of 55, 25, and 50 minutes.
- Average trial duration: (55 + 25 + 50) / 3 = 43.33 minutes.
- Did not convert to a paid subscription (only had free_trial and cancelled activities).
- Not included in the output because they didn't convert to paid.
- User 3:
- Had 3 days of free trial with durations of 70, 60, and 80 minutes.
- Average trial duration: (70 + 60 + 80) / 3 = 70.00 minutes.
- Had 3 days of paid subscription with durations of 50, 55, and 85 minutes.
- Average paid duration: (50 + 55 + 85) / 3 = 63.33 minutes.
- User 4:
- Had 2 days of free trial with durations of 40 and 35 minutes.
- Average trial duration: (40 + 35) / 2 = 37.50 minutes.
- Had 1 day of paid subscription with duration of 45 minutes before cancelling.
- Average paid duration: 45.00 minutes.
The result table only includes users who converted from free trial to paid subscription (users 1, 3, and 4), and is ordered by user_id in ascending order.
</div>
Solutions
Solution 1: Grouping + Conditional Filtering + Equi-Join
First, we filter the data in the table to exclude all records where activity_type
is equal to cancelled
. Then, we group the remaining data by user_id
and activity_type
, calculate the duration duration
for each group, and store the results in table T
.
Next, we filter table T
to extract records where activity_type
is free_trial
and paid
, storing them in tables F
and P
, respectively. Finally, we perform an equi-join on these two tables using user_id
, filter the required fields as per the problem statement, and sort the results to produce the final output.
-
import pandas as pd def analyze_subscription_conversion(user_activity: pd.DataFrame) -> pd.DataFrame: df = user_activity[user_activity["activity_type"] != "cancelled"] df_grouped = ( df.groupby(["user_id", "activity_type"])["activity_duration"] .mean() .add(0.0001) .round(2) .reset_index() ) df_free_trial = ( df_grouped[df_grouped["activity_type"] == "free_trial"] .rename(columns={"activity_duration": "trial_avg_duration"}) .drop(columns=["activity_type"]) ) df_paid = ( df_grouped[df_grouped["activity_type"] == "paid"] .rename(columns={"activity_duration": "paid_avg_duration"}) .drop(columns=["activity_type"]) ) result = df_free_trial.merge(df_paid, on="user_id", how="inner").sort_values( "user_id" ) return result
-
# Write your MySQL query statement below WITH T AS ( SELECT user_id, activity_type, ROUND(SUM(activity_duration) / COUNT(1), 2) duration FROM UserActivity WHERE activity_type != 'cancelled' GROUP BY user_id, activity_type ), F AS ( SELECT user_id, duration trial_avg_duration FROM T WHERE activity_type = 'free_trial' ), P AS ( SELECT user_id, duration paid_avg_duration FROM T WHERE activity_type = 'paid' ) SELECT user_id, trial_avg_duration, paid_avg_duration FROM F JOIN P USING (user_id) ORDER BY 1;