Welcome to Subscribe On Youtube

3673. Find Zombie Sessions

Description

Table: app_events

++-+
\| event_id         \| int      \|
\| user_id          \| int      \|
\| event_timestamp  \| datetime \|
\| event_type       \| varchar  \|
\| session_id       \| varchar  \|
\| event_value      \| int      \|
++-++++-+
\| event_id \| user_id \| event_timestamp     \| event_type \| session_id \| event_value \|
++-++++-+

Output:

+++--+--+
\| S001       \| 201     \| 35                       \| 6            \|
++-----+

Explanation:

  • Session S001 (User 201):
    • Duration: 10:00:00 to 10:35:00 = 35 minutes (more than 30) 
    • Scroll events: 6 (at least 5) 
    • Click events: 0
    • Click-to-scroll ratio: 0/6 = 0.00 (less than 0.20) 
    • Purchases: 0 (no purchases) 
    • S001 is a zombie session (meets all criteria)
  • Session S002 (User 202):
    • Duration: 11:00:00 to 11:20:00 = 20 minutes (less than 30) 
    • Has a purchase event 
    • S002 is not a zombie session 
  • Session S003 (User 203):
    • Duration: 12:00:00 to 13:00:00 = 60 minutes (more than 30) 
    • Scroll events: 5 (at least 5) 
    • Click events: 1
    • Click-to-scroll ratio: 1/5 = 0.20 (not less than 0.20) 
    • Purchases: 0 (no purchases) 
    • S003 is not a zombie session (click-to-scroll ratio equals 0.20, needs to be less)
  • Session S004 (User 204):
    • Duration: 14:00:00 to 14:12:00 = 12 minutes (less than 30) 
    • Scroll events: 2 (less than 5) 
    • S004  is not a zombie session 

The result table is ordered by scroll_count in descending order, then by session_id in ascending order.

</div>

Solutions

Solution 1: Grouped Aggregation

We can group the sessions by session_id, calculate the session duration, the number of scroll events, click events, and purchase events for each session, then filter according to the conditions given in the problem. Finally, we sort by the number of scroll events in descending order and by session ID in ascending order.

  • import pandas as pd
    
    
    def find_zombie_sessions(app_events: pd.DataFrame) -> pd.DataFrame:
        if not pd.api.types.is_datetime64_any_dtype(app_events["event_timestamp"]):
            app_events["event_timestamp"] = pd.to_datetime(app_events["event_timestamp"])
    
        grouped = app_events.groupby(["session_id", "user_id"])
    
        result = grouped.agg(
            session_duration_minutes=(
                "event_timestamp",
                lambda x: (x.max() - x.min()).total_seconds() // 60,
            ),
            scroll_count=("event_type", lambda x: (x == "scroll").sum()),
            click_count=("event_type", lambda x: (x == "click").sum()),
            purchase_count=("event_type", lambda x: (x == "purchase").sum()),
        ).reset_index()
    
        result = result[
            (result["session_duration_minutes"] >= 30)
            & (result["click_count"] / result["scroll_count"] < 0.2)
            & (result["purchase_count"] == 0)
            & (result["scroll_count"] >= 5)
        ]
    
        result = result.sort_values(
            by=["scroll_count", "session_id"], ascending=[False, True]
        ).reset_index(drop=True)
    
        return result[["session_id", "user_id", "session_duration_minutes", "scroll_count"]]
    
    
  • # Write your MySQL query statement below
    SELECT
        session_id,
        user_id,
        TIMESTAMPDIFF(MINUTE, MIN(event_timestamp), MAX(event_timestamp)) session_duration_minutes,
        SUM(event_type = 'scroll') scroll_count
    FROM app_events
    GROUP BY session_id
    HAVING
        session_duration_minutes >= 30
        AND SUM(event_type = 'click') / SUM(event_type = 'scroll') < 0.2
        AND SUM(event_type = 'purchase') = 0
        AND SUM(event_type = 'scroll') >= 5
    ORDER BY scroll_count DESC, session_id;
    
    

All Problems

All Solutions