Welcome to Subscribe On Youtube

3089. Find Bursty Behavior

Description

Table: Posts

+-------------+---------+
\| Column Name \| Type    \|
+-------------+---------+
\| post_id     \| int     \|
\| user_id     \| int     \|
\| post_date   \| date    \|
+-------------+---------+
post_id is the primary key (column with unique values) for this table.
Each row of this table contains post_id, user_id, and post_date.

Write a solution to find users who demonstrate bursty behavior in their posting patterns during February 2024. Bursty behavior is defined as any period of 7 consecutive days where a user's posting frequency is at least twice to their average weekly posting frequency for February 2024.

Note: Only include the dates from February 1 to February 28 in your analysis, which means you should count February as having exactly 4 weeks.

Return the result table orderd by user_id in ascending order.

The result format is in the following example.

 

Example:

Input:

Posts table:

+---------+---------+------------+
\| post_id \| user_id \| post_date  \|
+---------+---------+------------+
\| 1       \| 1       \| 2024-02-27 \|
\| 2       \| 5       \| 2024-02-06 \|
\| 3       \| 3       \| 2024-02-25 \|
\| 4       \| 3       \| 2024-02-14 \|
\| 5       \| 3       \| 2024-02-06 \|
\| 6       \| 2       \| 2024-02-25 \|
+---------+---------+------------+

Output:

+---------+----------------+------------------+
\| user_id \| max_7day_posts \| avg_weekly_posts \|
+---------+----------------+------------------+
\| 1       \| 1              \| 0.2500           \|
\| 2       \| 1              \| 0.2500           \|
\| 5       \| 1              \| 0.2500           \|
+---------+----------------+------------------+

Explanation:

  • User 1: Made only 1 post in February, resulting in an average of 0.25 posts per week and a max of 1 post in any 7-day period.
  • User 2: Also made just 1 post, with the same average and max 7-day posting frequency as User 1.
  • User 5: Like Users 1 and 2, User 5 made only 1 post throughout February, leading to the same average and max 7-day posting metrics.
  • User 3: Although User 3 made more posts than the others (3 posts), they did not reach twice the average weekly posts in their consecutive 7-day window, so they are not listed in the output.

Note: Output table is ordered by user_id in ascending order.

Solutions

Solution 1: Self-Join + Group Count

We can use self-join to connect the Posts table with itself. The connection condition is p1.user_id = p2.user_id and p2.post_date is between p1.post_date and 6 days after p1.post_date. Then we group the connection results by p1.user_id and p1.post_id to count the number of posts for each user within 7 days of each day. We save this result in table P.

Next, we count the average number of posts per week for each user in February 2024 and save it in table T. Note that we need to find records where post_date is between 2024-02-01 and 2024-02-28, group the records by user_id, then count the number of posts for each user, and finally divide by 4 to get the average number of posts per week. We save this result in table T.

Finally, we connect tables P and T with the condition P.user_id = T.user_id, then group by user_id to count the maximum number of posts within 7 days for each user. We then filter out records that meet the condition max_7day_posts >= avg_weekly_posts * 2 to get the result. Note that we need to sort in ascending order by user_id.

  • import pandas as pd
    
    
    def find_bursty_behavior(posts: pd.DataFrame) -> pd.DataFrame:
        # Subquery P
        p1 = pd.merge(posts, posts, on="user_id", suffixes=("_1", "_2"))
        p1 = p1[
            p1["post_date_2"].between(
                p1["post_date_1"], p1["post_date_1"] + pd.Timedelta(days=6)
            )
        ]
        p1 = p1.groupby(["user_id", "post_id_1"]).size().reset_index(name="cnt")
    
        # Subquery T
        t = posts[
            (posts["post_date"] >= "2024-02-01") & (posts["post_date"] <= "2024-02-28")
        ]
        t = t.groupby("user_id").size().div(4).reset_index(name="avg_weekly_posts")
    
        # Joining P and T
        merged_df = pd.merge(p1, t, on="user_id", how="inner")
    
        # Filtering
        filtered_df = merged_df[merged_df["cnt"] >= merged_df["avg_weekly_posts"] * 2]
    
        # Aggregating
        result_df = (
            filtered_df.groupby("user_id")
            .agg({"cnt": "max", "avg_weekly_posts": "first"})
            .reset_index()
        )
        result_df.columns = ["user_id", "max_7day_posts", "avg_weekly_posts"]
    
        # Sorting
        result_df.sort_values(by="user_id", inplace=True)
    
        return result_df
    
    
  • # Write your MySQL query statement below
    WITH
        P AS (
            SELECT p1.user_id AS user_id, COUNT(1) AS cnt
            FROM
                Posts AS p1
                JOIN Posts AS p2
                    ON p1.user_id = p2.user_id
                    AND p2.post_date BETWEEN p1.post_date AND DATE_ADD(p1.post_date, INTERVAL 6 DAY)
            GROUP BY p1.user_id, p1.post_id
        ),
        T AS (
            SELECT user_id, COUNT(1) / 4 AS avg_weekly_posts
            FROM Posts
            WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
            GROUP BY 1
        )
    SELECT user_id, MAX(cnt) AS max_7day_posts, avg_weekly_posts
    FROM
        P
        JOIN T USING (user_id)
    GROUP BY 1
    HAVING max_7day_posts >= avg_weekly_posts * 2
    ORDER BY 1;
    
    

All Problems

All Solutions