Welcome to Subscribe On Youtube
3060. User Activities within Time Bounds
Description
Table: Sessions
+---------------+----------+ \| Column Name \| Type \| +---------------+----------+ \| user_id \| int \| \| session_start \| datetime \| \| session_end \| datetime \| \| session_id \| int \| \| session_type \| enum \| +---------------+----------+ session_id is column of unique values for this table. session_type is an ENUM (category) type of (Viewer, Streamer). This table contains user id, session start, session end, session id and session type.
Write a solution to find the the users who have had at least one consecutive session of the same type (either 'Viewer' or 'Streamer') with a maximum gap of 12
hours between sessions.
Return the result table ordered by user_id
in ascending order.
The result format is in the following example.
Example:
Input: Sessions table: +---------+---------------------+---------------------+------------+--------------+ \| user_id \| session_start \| session_end \| session_id \| session_type \| +---------+---------------------+---------------------+------------+--------------+ \| 101 \| 2023-11-01 08:00:00 \| 2023-11-01 09:00:00 \| 1 \| Viewer \| \| 101 \| 2023-11-01 10:00:00 \| 2023-11-01 11:00:00 \| 2 \| Streamer \| \| 102 \| 2023-11-01 13:00:00 \| 2023-11-01 14:00:00 \| 3 \| Viewer \| \| 102 \| 2023-11-01 15:00:00 \| 2023-11-01 16:00:00 \| 4 \| Viewer \| \| 101 \| 2023-11-02 09:00:00 \| 2023-11-02 10:00:00 \| 5 \| Viewer \| \| 102 \| 2023-11-02 12:00:00 \| 2023-11-02 13:00:00 \| 6 \| Streamer \| \| 101 \| 2023-11-02 13:00:00 \| 2023-11-02 14:00:00 \| 7 \| Streamer \| \| 102 \| 2023-11-02 16:00:00 \| 2023-11-02 17:00:00 \| 8 \| Viewer \| \| 103 \| 2023-11-01 08:00:00 \| 2023-11-01 09:00:00 \| 9 \| Viewer \| \| 103 \| 2023-11-02 20:00:00 \| 2023-11-02 23:00:00 \| 10 \| Viewer \| \| 103 \| 2023-11-03 09:00:00 \| 2023-11-03 10:00:00 \| 11 \| Viewer \| +---------+---------------------+---------------------+------------+--------------+ Output: +---------+ \| user_id \| +---------+ \| 102 \| \| 103 \| +---------+ Explanation: - User ID 101 will not be included in the final output as they do not have any consecutive sessions of the same session type. - User ID 102 will be included in the final output as they had two viewer sessions with session IDs 3 and 4, respectively, and the time gap between them was less than 12 hours. - User ID 103 participated in two viewer sessions with a gap of less than 12 hours between them, identified by session IDs 10 and 11. Therefore, user 103 will be included in the final output. Output table is ordered by user_id in increasing order.
Solutions
Solution 1: Window Function + Time Function
First, we use the LAG
window function to find the end time of the previous session of the same type for each user, denoted as prev_session_end
. Then we use the TIMESTAMPDIFF
function to calculate the time difference between the start time of the current session and the end time of the previous session. If the time difference is less than or equal to 12 hours, then this user meets the requirements of the problem.
-
import pandas as pd def user_activities(sessions: pd.DataFrame) -> pd.DataFrame: sessions = sessions.sort_values(by=["user_id", "session_start"]) sessions["prev_session_end"] = sessions.groupby(["user_id", "session_type"])[ "session_end" ].shift(1) sessions_filtered = sessions[ sessions["session_start"] - sessions["prev_session_end"] <= pd.Timedelta(hours=12) ] return pd.DataFrame({"user_id": sessions_filtered["user_id"].unique()})
-
# Write your MySQL query statement below WITH T AS ( SELECT user_id, session_start, LAG(session_end) OVER ( PARTITION BY user_id, session_type ORDER BY session_end ) AS prev_session_end FROM Sessions ) SELECT DISTINCT user_id FROM T WHERE TIMESTAMPDIFF(HOUR, prev_session_end, session_start) <= 12;