Welcome to Subscribe On Youtube

3793. Find Users with High Token Usage

Description

Table: prompts

+-++
\| user_id     \| int     \|
\| prompt      \| varchar \|
\| tokens      \| int     \|
+-++--+--+
\| user_id \| prompt                   \| tokens \|
++--+--+

Output:

+++
\| user_id \| prompt_count  \| avg_tokens \|
+++
\| 3       \| 4             \| 237.5      \|
\| 1       \| 3             \| 133.33     \|
+++

Explanation:

  • User 1:
    • Total prompts = 3
    • Average tokens = (120 + 80 + 200) / 3 = 133.33
    • Has a prompt with 200 tokens, which is greater than the average
    • Included in the result
  • User 2:
    • Total prompts = 2 (less than the required minimum)
    • Excluded from the result
  • User 3:
    • Total prompts = 4
    • Average tokens = (300 + 250 + 180 + 220) / 4 = 237.5
    • Has prompts with 300 and 250 tokens, both greater than the average
    • Included in the result

The Results table is ordered by avg_tokens in descending order, then by user_id in ascending order

</div>

Solutions

Solution 1: Group By Statistics

We first group the prompts by user_id and calculate for each user the total number of prompts prompt_count, the average tokens avg_tokens, and the maximum tokens max_tokens. Then we filter users who meet the criteria, i.e., those who have submitted at least 3 prompts and have at least one prompt with tokens greater than their average token usage. Finally, we sort the results by avg_tokens in descending order and by user_id in ascending order.

  • import pandas as pd
    
    
    def find_users_with_high_tokens(prompts: pd.DataFrame) -> pd.DataFrame:
        df = prompts.groupby("user_id", as_index=False).agg(
            prompt_count=("user_id", "size"),
            avg_tokens=("tokens", "mean"),
            max_tokens=("tokens", "max"),
        )
    
        df["avg_tokens"] = df["avg_tokens"].round(2)
    
        df = df[(df["prompt_count"] >= 3) & (df["max_tokens"] > df["avg_tokens"])]
    
        df = (
            df.sort_values(["avg_tokens", "user_id"], ascending=[False, True])
            .loc[:, ["user_id", "prompt_count", "avg_tokens"]]
            .reset_index(drop=True)
        )
    
        return df
    
    
  • # Write your MySQL query statement below
    SELECT
        user_id,
        COUNT(1) AS prompt_count,
        ROUND(AVG(tokens), 2) AS avg_tokens
    FROM prompts
    GROUP BY user_id
    HAVING prompt_count >= 3 AND MAX(tokens) > avg_tokens
    ORDER BY avg_tokens DESC, user_id;
    
    

All Problems

All Solutions