Welcome to Subscribe On Youtube

3554. Find Category Recommendation Pairs

Description

Table: ProductPurchases

+-++
\| Column Name \| Type \| 
+-++
\| user_id     \| int  \|
\| product_id  \| int  \|
\| quantity    \| int  \|
+-++
(user_id, product_id) is the unique identifier for this table. 
Each row represents a purchase of a product by a user in a specific quantity.

Table: ProductInfo

+-++
\| product_id  \| int     \|
\| category    \| varchar \|
\| price       \| decimal \|
+-+++++++-+
\| category1   \| category2   \| customer_count \|
+-+-+-+

Explanation:

  • Books-Clothing:
    • User 1 purchased products from Books (102) and Clothing (201)
    • User 2 purchased products from Books (102, 103) and Clothing (201)
    • User 5 purchased products from Books (102, 103) and Clothing (201, 202)
    • Total: 3 customers purchased from both categories
  • Books-Electronics:
    • User 1 purchased products from Books (102) and Electronics (101)
    • User 2 purchased products from Books (102, 103) and Electronics (101)
    • User 3 purchased products from Books (103) and Electronics (101)
    • Total: 3 customers purchased from both categories
  • Clothing-Electronics:
    • User 1 purchased products from Clothing (201) and Electronics (101)
    • User 2 purchased products from Clothing (201) and Electronics (101)
    • User 4 purchased products from Clothing (201) and Electronics (101)
    • Total: 3 customers purchased from both categories
  • Electronics-Sports:
    • User 1 purchased products from Electronics (101) and Sports (301)
    • User 3 purchased products from Electronics (101) and Sports (301)
    • User 4 purchased products from Electronics (101) and Sports (301)
    • Total: 3 customers purchased from both categories
  • Other category pairs like Clothing-Sports (only 2 customers: Users 1 and 4) and Books-Kitchen (only 1 customer: User 3) have fewer than 3 shared customers and are not included in the result.

The result is ordered by customer_count in descending order. Since all pairs have the same customer_count of 3, they are ordered by category1 (then category2) in ascending order.

</div>

Solutions

Solution 1: Join + Group Aggregation

First, we join the ProductPurchases table and the ProductInfo table on product_id to obtain a user_category table consisting of user_id and category. Next, we self-join the user_category table to get all category pairs purchased by each user. Finally, we group these category pairs, count the number of users for each pair, and filter out the pairs with at least 3 users.

Lastly, we sort the final result by customer count in descending order, then by category1 in ascending order, and then by category2 in ascending order.

  • import pandas as pd
    
    
    def find_category_recommendation_pairs(
        product_purchases: pd.DataFrame, product_info: pd.DataFrame
    ) -> pd.DataFrame:
        df = product_purchases[["user_id", "product_id"]].merge(
            product_info[["product_id", "category"]], on="product_id", how="inner"
        )
        user_category = df.drop_duplicates(subset=["user_id", "category"])
        pair_per_user = (
            user_category.merge(user_category, on="user_id")
            .query("category_x < category_y")
            .rename(columns={"category_x": "category1", "category_y": "category2"})
        )
        pair_counts = (
            pair_per_user.groupby(["category1", "category2"])["user_id"]
            .nunique()
            .reset_index(name="customer_count")
        )
        result = (
            pair_counts.query("customer_count >= 3")
            .sort_values(
                ["customer_count", "category1", "category2"], ascending=[False, True, True]
            )
            .reset_index(drop=True)
        )
        return result
    
    
  • # Write your MySQL query statement below
    WITH
        user_category AS (
            SELECT DISTINCT
                user_id,
                category
            FROM
                ProductPurchases
                JOIN ProductInfo USING (product_id)
        ),
        pair_per_user AS (
            SELECT
                a.user_id,
                a.category AS category1,
                b.category AS category2
            FROM
                user_category AS a
                JOIN user_category AS b ON a.user_id = b.user_id AND a.category < b.category
        )
    SELECT category1, category2, COUNT(DISTINCT user_id) AS customer_count
    FROM pair_per_user
    GROUP BY 1, 2
    HAVING customer_count >= 3
    ORDER BY 3 DESC, 1, 2;
    
    

All Problems

All Solutions