Welcome to Subscribe On Youtube

3056. Snaps Analysis

Description

Table: Activities

+---------------+---------+
\| Column Name   \| Type    \|
+---------------+---------+
\| activity_id   \| int     \|
\| user_id       \| int     \|
\| activity_type \| enum    \|
\| time_spent    \| decimal \|
+---------------+---------+
activity_id is column of unique values for this table.
activity_type is an ENUM (category) type of ('send', 'open'). 
This table contains activity id, user id, activity type and time spent.

Table: Age

+-------------+------+
\| Column Name \| Type \|
+-------------+------+
\| user_id     \| int  \|
\| age_bucket  \| enum \|
+-------------+------+
user_id is the column of unique values for this table.
age_bucket is an ENUM (category) type of ('21-25', '26-30', '31-35'). 
This table contains user id and age group.

Write a solution to calculate the percentage of the total time spent on sending and opening snaps for each age group. Precentage should be rounded to 2 decimal places.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Activities table:
+-------------+---------+---------------+------------+
\| activity_id \| user_id \| activity_type \| time_spent \|
+-------------+---------+---------------+------------+
\| 7274        \| 123     \| open          \| 4.50       \| 
\| 2425        \| 123     \| send          \| 3.50       \| 
\| 1413        \| 456     \| send          \| 5.67       \| 
\| 2536        \| 456     \| open          \| 3.00       \| 
\| 8564        \| 456     \| send          \| 8.24       \| 
\| 5235        \| 789     \| send          \| 6.24       \| 
\| 4251        \| 123     \| open          \| 1.25       \| 
\| 1435        \| 789     \| open          \| 5.25       \| 
+-------------+---------+---------------+------------+
Age table:
+---------+------------+
\| user_id \| age_bucket \| 
+---------+------------+
\| 123     \| 31-35      \| 
\| 789     \| 21-25      \| 
\| 456     \| 26-30      \| 
+---------+------------+
Output: 
+------------+-----------+-----------+
\| age_bucket \| send_perc \| open_perc \|
+------------+-----------+-----------+
\| 31-35      \| 37.84     \| 62.16     \|
\| 26-30      \| 82.26     \| 17.74     \|
\| 21-25      \| 54.31     \| 45.69     \|
+------------+-----------+-----------+
Explanation: 
For age group 31-35:
  - There is only one user belonging to this group with the user ID 123.
  - The total time spent on sending snaps by this user is 3.50, and the time spent on opening snaps is 4.50 + 1.25 = 5.75.
  - The overall time spent by this user is 3.50 + 5.75 = 9.25.
  - Therefore, the sending snap percentage will be (3.50 / 9.25) * 100 = 37.84, and the opening snap percentage will be (5.75 / 9.25) * 100 = 62.16.
For age group 26-30: 
  - There is only one user belonging to this group with the user ID 456. 
  - The total time spent on sending snaps by this user is 5.67 + 8.24 = 13.91, and the time spent on opening snaps is 3.00. 
  - The overall time spent by this user is 13.91 + 3.00 = 16.91. 
  - Therefore, the sending snap percentage will be (13.91 / 16.91) * 100 = 82.26, and the opening snap percentage will be (3.00 / 16.91) * 100 = 17.74.
For age group 21-25: 
  - There is only one user belonging to this group with the user ID 789. 
  - The total time spent on sending snaps by this user is 6.24, and the time spent on opening snaps is 5.25. 
  - The overall time spent by this user is 6.24 + 5.25 = 11.49. 
  - Therefore, the sending snap percentage will be (6.24 / 11.49) * 100 = 54.31, and the opening snap percentage will be (5.25 / 11.49) * 100 = 45.69.
All percentages in output table rounded to the two decimal places.

Solutions

Solution 1: Equi-Join + Group By Summation

We can perform an equi-join to connect the Activities table and the Age table based on user_id. Then, group by age_bucket and finally calculate the percentage of sends and opens for each age group.

  • import pandas as pd
    
    
    def snap_analysis(activities: pd.DataFrame, age: pd.DataFrame) -> pd.DataFrame:
        merged_df = pd.merge(activities, age, on="user_id")
        total_time_per_age_activity = (
            merged_df.groupby(["age_bucket", "activity_type"])["time_spent"]
            .sum()
            .reset_index()
        )
        pivot_df = total_time_per_age_activity.pivot(
            index="age_bucket", columns="activity_type", values="time_spent"
        ).reset_index()
        pivot_df = pivot_df.fillna(0)
        pivot_df["send_perc"] = round(
            100 * pivot_df["send"] / (pivot_df["send"] + pivot_df["open"]), 2
        )
        pivot_df["open_perc"] = round(
            100 * pivot_df["open"] / (pivot_df["send"] + pivot_df["open"]), 2
        )
        return pivot_df[["age_bucket", "send_perc", "open_perc"]]
    
    
  • # Write your MySQL query statement below
    SELECT
        age_bucket,
        ROUND(100 * SUM(IF(activity_type = 'send', time_spent, 0)) / SUM(time_spent), 2) AS send_perc,
        ROUND(100 * SUM(IF(activity_type = 'open', time_spent, 0)) / SUM(time_spent), 2) AS open_perc
    FROM
        Activities
        JOIN Age USING (user_id)
    GROUP BY 1;
    
    

All Problems

All Solutions