3793. Find Users with High Token Usage

Description

Table: prompts

+-++
\| user_id     \| int     \|
\| prompt      \| varchar \|
\| tokens      \| int     \|
+-++--+--+
\| user_id \| prompt                   \| tokens \|
++--+--+

Output:

+++
\| user_id \| prompt_count  \| avg_tokens \|
+++
\| 3       \| 4             \| 237.5      \|
\| 1       \| 3             \| 133.33     \|
+++

Explanation:

User 1:
- Total prompts = 3
- Average tokens = (120 + 80 + 200) / 3 = 133.33
- Has a prompt with 200 tokens, which is greater than the average
- Included in the result
User 2:
- Total prompts = 2 (less than the required minimum)
- Excluded from the result
User 3:
- Total prompts = 4
- Average tokens = (300 + 250 + 180 + 220) / 4 = 237.5
- Has prompts with 300 and 250 tokens, both greater than the average
- Included in the result

The Results table is ordered by avg_tokens in descending order, then by user_id in ascending order

</div>

Solutions

Solution 1: Group By Statistics

We first group the prompts by user_id and calculate for each user the total number of prompts prompt_count, the average tokens avg_tokens, and the maximum tokens max_tokens. Then we filter users who meet the criteria, i.e., those who have submitted at least 3 prompts and have at least one prompt with tokens greater than their average token usage. Finally, we sort the results by avg_tokens in descending order and by user_id in ascending order.

Python
SQL

import pandas as pd


def find_users_with_high_tokens(prompts: pd.DataFrame) -> pd.DataFrame:
    df = prompts.groupby("user_id", as_index=False).agg(
        prompt_count=("user_id", "size"),
        avg_tokens=("tokens", "mean"),
        max_tokens=("tokens", "max"),
    )

    df["avg_tokens"] = df["avg_tokens"].round(2)

    df = df[(df["prompt_count"] >= 3) & (df["max_tokens"] > df["avg_tokens"])]

    df = (
        df.sort_values(["avg_tokens", "user_id"], ascending=[False, True])
        .loc[:, ["user_id", "prompt_count", "avg_tokens"]]
        .reset_index(drop=True)
    )

    return df

# Write your MySQL query statement below
SELECT
    user_id,
    COUNT(1) AS prompt_count,
    ROUND(AVG(tokens), 2) AS avg_tokens
FROM prompts
GROUP BY user_id
HAVING prompt_count >= 3 AND MAX(tokens) > avg_tokens
ORDER BY avg_tokens DESC, user_id;

3793 - Find Users with High Token Usage

3793. Find Users with High Token Usage

Description

Solutions

Solution 1: Group By Statistics

All Problems

All Solutions

Welcome to Subscribe On Youtube

3793. Find Users with High Token Usage

Description

Solutions

Solution 1: Group By Statistics

All Problems

All Solutions