Welcome to Subscribe On Youtube
3554. Find Category Recommendation Pairs
Description
Table: ProductPurchases
+-++ \| Column Name \| Type \| +-++ \| user_id \| int \| \| product_id \| int \| \| quantity \| int \| +-++ (user_id, product_id) is the unique identifier for this table. Each row represents a purchase of a product by a user in a specific quantity.
Table: ProductInfo
+-++ \| product_id \| int \| \| category \| varchar \| \| price \| decimal \| +-+++++++-+ \| category1 \| category2 \| customer_count \| +-+-+-+
Explanation:
- Books-Clothing:
- User 1 purchased products from Books (102) and Clothing (201)
- User 2 purchased products from Books (102, 103) and Clothing (201)
- User 5 purchased products from Books (102, 103) and Clothing (201, 202)
- Total: 3 customers purchased from both categories
- Books-Electronics:
- User 1 purchased products from Books (102) and Electronics (101)
- User 2 purchased products from Books (102, 103) and Electronics (101)
- User 3 purchased products from Books (103) and Electronics (101)
- Total: 3 customers purchased from both categories
- Clothing-Electronics:
- User 1 purchased products from Clothing (201) and Electronics (101)
- User 2 purchased products from Clothing (201) and Electronics (101)
- User 4 purchased products from Clothing (201) and Electronics (101)
- Total: 3 customers purchased from both categories
- Electronics-Sports:
- User 1 purchased products from Electronics (101) and Sports (301)
- User 3 purchased products from Electronics (101) and Sports (301)
- User 4 purchased products from Electronics (101) and Sports (301)
- Total: 3 customers purchased from both categories
- Other category pairs like Clothing-Sports (only 2 customers: Users 1 and 4) and Books-Kitchen (only 1 customer: User 3) have fewer than 3 shared customers and are not included in the result.
The result is ordered by customer_count in descending order. Since all pairs have the same customer_count of 3, they are ordered by category1 (then category2) in ascending order.
</div>
Solutions
Solution 1: Join + Group Aggregation
First, we join the ProductPurchases
table and the ProductInfo
table on product_id
to obtain a user_category
table consisting of user_id
and category
. Next, we self-join the user_category
table to get all category pairs purchased by each user. Finally, we group these category pairs, count the number of users for each pair, and filter out the pairs with at least 3 users.
Lastly, we sort the final result by customer count in descending order, then by category1
in ascending order, and then by category2
in ascending order.
-
import pandas as pd def find_category_recommendation_pairs( product_purchases: pd.DataFrame, product_info: pd.DataFrame ) -> pd.DataFrame: df = product_purchases[["user_id", "product_id"]].merge( product_info[["product_id", "category"]], on="product_id", how="inner" ) user_category = df.drop_duplicates(subset=["user_id", "category"]) pair_per_user = ( user_category.merge(user_category, on="user_id") .query("category_x < category_y") .rename(columns={"category_x": "category1", "category_y": "category2"}) ) pair_counts = ( pair_per_user.groupby(["category1", "category2"])["user_id"] .nunique() .reset_index(name="customer_count") ) result = ( pair_counts.query("customer_count >= 3") .sort_values( ["customer_count", "category1", "category2"], ascending=[False, True, True] ) .reset_index(drop=True) ) return result
-
# Write your MySQL query statement below WITH user_category AS ( SELECT DISTINCT user_id, category FROM ProductPurchases JOIN ProductInfo USING (product_id) ), pair_per_user AS ( SELECT a.user_id, a.category AS category1, b.category AS category2 FROM user_category AS a JOIN user_category AS b ON a.user_id = b.user_id AND a.category < b.category ) SELECT category1, category2, COUNT(DISTINCT user_id) AS customer_count FROM pair_per_user GROUP BY 1, 2 HAVING customer_count >= 3 ORDER BY 3 DESC, 1, 2;