Welcome to Subscribe On Youtube

3497. Analyze Subscription Conversion

Description

Table: UserActivity

+++
\| user_id          \| int     \|
\| activity_date    \| date    \|
\| activity_type    \| varchar \|
\| activity_duration\| int     \|
++++-+
\| user_id \| activity_date \| activity_type \| activity_duration \|
++++-+

Output:

++--+-+
\| 1       \| 45.00              \| 76.67             \|
\| 3       \| 70.00              \| 63.33             \|
\| 4       \| 37.50              \| 45.00             \|
+----+

Explanation:

  • User 1:
    • Had 3 days of free trial with durations of 45, 30, and 60 minutes.
    • Average trial duration: (45 + 30 + 60) / 3 = 45.00 minutes.
    • Had 3 days of paid subscription with durations of 75, 90, and 65 minutes.
    • Average paid duration: (75 + 90 + 65) / 3 = 76.67 minutes.
  • User 2:
    • Had 3 days of free trial with durations of 55, 25, and 50 minutes.
    • Average trial duration: (55 + 25 + 50) / 3 = 43.33 minutes.
    • Did not convert to a paid subscription (only had free_trial and cancelled activities).
    • Not included in the output because they didn't convert to paid.
  • User 3:
    • Had 3 days of free trial with durations of 70, 60, and 80 minutes.
    • Average trial duration: (70 + 60 + 80) / 3 = 70.00 minutes.
    • Had 3 days of paid subscription with durations of 50, 55, and 85 minutes.
    • Average paid duration: (50 + 55 + 85) / 3 = 63.33 minutes.
  • User 4:
    • Had 2 days of free trial with durations of 40 and 35 minutes.
    • Average trial duration: (40 + 35) / 2 = 37.50 minutes.
    • Had 1 day of paid subscription with duration of 45 minutes before cancelling.
    • Average paid duration: 45.00 minutes.

The result table only includes users who converted from free trial to paid subscription (users 1, 3, and 4), and is ordered by user_id in ascending order.

</div>

Solutions

Solution 1: Grouping + Conditional Filtering + Equi-Join

First, we filter the data in the table to exclude all records where activity_type is equal to cancelled. Then, we group the remaining data by user_id and activity_type, calculate the duration duration for each group, and store the results in table T.

Next, we filter table T to extract records where activity_type is free_trial and paid, storing them in tables F and P, respectively. Finally, we perform an equi-join on these two tables using user_id, filter the required fields as per the problem statement, and sort the results to produce the final output.

  • import pandas as pd
    
    
    def analyze_subscription_conversion(user_activity: pd.DataFrame) -> pd.DataFrame:
        df = user_activity[user_activity["activity_type"] != "cancelled"]
    
        df_grouped = (
            df.groupby(["user_id", "activity_type"])["activity_duration"]
            .mean()
            .add(0.0001)
            .round(2)
            .reset_index()
        )
    
        df_free_trial = (
            df_grouped[df_grouped["activity_type"] == "free_trial"]
            .rename(columns={"activity_duration": "trial_avg_duration"})
            .drop(columns=["activity_type"])
        )
    
        df_paid = (
            df_grouped[df_grouped["activity_type"] == "paid"]
            .rename(columns={"activity_duration": "paid_avg_duration"})
            .drop(columns=["activity_type"])
        )
    
        result = df_free_trial.merge(df_paid, on="user_id", how="inner").sort_values(
            "user_id"
        )
    
        return result
    
    
  • # Write your MySQL query statement below
    WITH
        T AS (
            SELECT user_id, activity_type, ROUND(SUM(activity_duration) / COUNT(1), 2) duration
            FROM UserActivity
            WHERE activity_type != 'cancelled'
            GROUP BY user_id, activity_type
        ),
        F AS (
            SELECT user_id, duration trial_avg_duration
            FROM T
            WHERE activity_type = 'free_trial'
        ),
        P AS (
            SELECT user_id, duration paid_avg_duration
            FROM T
            WHERE activity_type = 'paid'
        )
    SELECT user_id, trial_avg_duration, paid_avg_duration
    FROM
        F
        JOIN P USING (user_id)
    ORDER BY 1;
    
    

All Problems

All Solutions