Welcome to Subscribe On Youtube
3832. Find Users with Persistent Behavior Patterns
Description
Table: activity
+--++ \| user_id \| int \| \| action_date \| date \| \| action \| varchar \| +--++-+--+ \| user_id \| action_date \| action \| ++-+--+
Output:
++++ \| user_id \| action \| streak_length \| start_date \| end_date \| ++++ \| 3 \| view \| 7 \| 2024-01-01 \| 2024-01-07 \| \| 1 \| login \| 5 \| 2024-01-01 \| 2024-01-05 \| ++++
Explanation:
- User 1:
- Performed
loginfrom 2024-01-01 to 2024-01-05 on consecutive days - Each day has exactly one action, and the action is the same
- Streak length = 5 (meets minimum requirement)
- The action changes on 2024-01-06, ending the streak
- Performed
- User 2:
- Performed
clickfor only 4 consecutive days - Does not meet the minimum streak length of 5
- Excluded from the result
- Performed
- User 3:
- Performed
viewfor 7 consecutive days - This is the longest valid sequence for this user
- Included in the result
- Performed
The Results table is ordered by streak_length in descending order, then by user_id in ascending order
</div>
Solutions
Solution 1: Filtering + Grouping + Aggregation
We first need to filter user dates with only a single action per day, then identify consecutive intervals among these dates, and finally aggregate these intervals to calculate the streak length and filter records that meet the criteria.
-
import pandas as pd def find_behaviorally_stable_users(activity: pd.DataFrame) -> pd.DataFrame: activity['action_date'] = pd.to_datetime(activity['action_date']) # Filter users with only a single action per day df = activity.assign( cnt=activity.groupby(['user_id', 'action_date'])['action'].transform('count') ) df = df[df['cnt'] == 1].sort_values(['user_id', 'action', 'action_date']) # Identify consecutive intervals df['rn'] = df.groupby(['user_id', 'action'])['action_date'].rank(method='first') df['grp'] = df['action_date'] - pd.to_timedelta(df['rn'], unit='D') # Aggregate streaks streaks = ( df.groupby(['user_id', 'action', 'grp']) .agg( streak_length=('action_date', 'count'), start_date=('action_date', 'min'), end_date=('action_date', 'max'), ) .reset_index() ) # Filter and get the longest streak for each user res = streaks[streaks['streak_length'] >= 5].sort_values( ['streak_length', 'user_id'], ascending=[False, True] ) return res.groupby('user_id').head(1)[ ['user_id', 'action', 'streak_length', 'start_date', 'end_date'] ] -
# Write your MySQL query statement below WITH daily_counts AS ( -- Step 1: Filter user dates with exactly one record per day (meeting the requirement of "exactly one action per day") SELECT user_id, action_date, action, COUNT(*) OVER (PARTITION BY user_id, action_date) AS cnt FROM activity ), filtered_activity AS ( -- Step 2: Filter out data with multiple actions on the same day SELECT user_id, action_date, action FROM daily_counts WHERE cnt = 1 ), streak_groups AS ( -- Step 3: Group consecutive dates using the method of subtracting row number from date SELECT user_id, action, action_date, DATE_SUB( action_date, INTERVAL ROW_NUMBER() OVER ( PARTITION BY user_id, action ORDER BY action_date ) DAY ) AS grp FROM filtered_activity ), streak_summary AS ( -- Step 4: Calculate the length of each consecutive segment and only keep records with length >= 5 SELECT user_id, action, COUNT(*) AS streak_length, MIN(action_date) AS start_date, MAX(action_date) AS end_date, -- Sort different streaks for each user to facilitate getting the maximum value later ROW_NUMBER() OVER ( PARTITION BY user_id ORDER BY COUNT(*) DESC ) AS rnk FROM streak_groups GROUP BY user_id, action, grp HAVING streak_length >= 5 ) -- Step 5: Extract the longest record for each qualified user and sort SELECT user_id, action, streak_length, start_date, end_date FROM streak_summary WHERE rnk = 1 ORDER BY streak_length DESC, user_id ASC;