Welcome to Subscribe On Youtube
3793. Find Users with High Token Usage
Description
Table: prompts
+-++ \| user_id \| int \| \| prompt \| varchar \| \| tokens \| int \| +-++--+--+ \| user_id \| prompt \| tokens \| ++--+--+
Output:
+++ \| user_id \| prompt_count \| avg_tokens \| +++ \| 3 \| 4 \| 237.5 \| \| 1 \| 3 \| 133.33 \| +++
Explanation:
- User 1:
- Total prompts = 3
- Average tokens = (120 + 80 + 200) / 3 = 133.33
- Has a prompt with 200 tokens, which is greater than the average
- Included in the result
- User 2:
- Total prompts = 2 (less than the required minimum)
- Excluded from the result
- User 3:
- Total prompts = 4
- Average tokens = (300 + 250 + 180 + 220) / 4 = 237.5
- Has prompts with 300 and 250 tokens, both greater than the average
- Included in the result
The Results table is ordered by avg_tokens in descending order, then by user_id in ascending order
</div>
Solutions
Solution 1: Group By Statistics
We first group the prompts by user_id and calculate for each user the total number of prompts prompt_count, the average tokens avg_tokens, and the maximum tokens max_tokens. Then we filter users who meet the criteria, i.e., those who have submitted at least 3 prompts and have at least one prompt with tokens greater than their average token usage. Finally, we sort the results by avg_tokens in descending order and by user_id in ascending order.
-
import pandas as pd def find_users_with_high_tokens(prompts: pd.DataFrame) -> pd.DataFrame: df = prompts.groupby("user_id", as_index=False).agg( prompt_count=("user_id", "size"), avg_tokens=("tokens", "mean"), max_tokens=("tokens", "max"), ) df["avg_tokens"] = df["avg_tokens"].round(2) df = df[(df["prompt_count"] >= 3) & (df["max_tokens"] > df["avg_tokens"])] df = ( df.sort_values(["avg_tokens", "user_id"], ascending=[False, True]) .loc[:, ["user_id", "prompt_count", "avg_tokens"]] .reset_index(drop=True) ) return df -
# Write your MySQL query statement below SELECT user_id, COUNT(1) AS prompt_count, ROUND(AVG(tokens), 2) AS avg_tokens FROM prompts GROUP BY user_id HAVING prompt_count >= 3 AND MAX(tokens) > avg_tokens ORDER BY avg_tokens DESC, user_id;