Welcome to Subscribe On Youtube
3764. Most Common Course Pairs
Description
Table: course_completions
+-++ \| user_id \| int \| \| course_id \| int \| \| course_name \| varchar \| \| completion_date \| date \| \| course_rating \| int \| +-++--++--++--+---+
Explanation:
- User 1: Completed 6 courses with average rating 4.5 (qualifies as top performer)
- User 2: Completed 5 courses with average rating 4.4 (qualifies as top performer)
- User 3: Completed 5 courses but average rating is 2.8 (does not qualify)
- User 4: Completed only 3 courses (does not qualify)
- Course Pairs Among Top Performers:
- User 1: Python Basics → SQL Fundamentals → JavaScript → React Basics → Node.js → Docker
- User 2: Python Basics → React Basics → Node.js → Docker → AWS Fundamentals
- Most common transitions: Node.js → Docker (2 times), React Basics → Node.js (2 times)
Results are ordered by transition_count in descending order, then by first_course in ascending order, and then by second_course in ascending order.
</div>
Solutions
Solution 1: Grouping and Counting
We first filter out all top students, denoted as top_students, i.e., students who have completed at least 5 courses with an average rating of at least 4. Then for each top student, we sort by completion time and find all consecutive course pairs, denoted as course_pairs. Finally, we group and count all course pairs, calculate the occurrence count of each course pair, and output the results sorted as required.
-
import pandas as pd def topLearnerCourseTransitions(course_completions: pd.DataFrame) -> pd.DataFrame: grp = course_completions.groupby("user_id") top_students = grp.filter( lambda df: df.shape[0] >= 5 and df["course_rating"].mean() >= 4 )["user_id"].unique() df = course_completions[course_completions["user_id"].isin(top_students)].copy() df = df.sort_values(["user_id", "completion_date"]) df["second_course"] = df.groupby("user_id")["course_name"].shift(-1) df["first_course"] = df["course_name"] pairs = df[df["second_course"].notna()][["first_course", "second_course"]] result = ( pairs.groupby(["first_course", "second_course"]) .size() .reset_index(name="transition_count") .sort_values( ["transition_count", "first_course", "second_course"], ascending=[False, True, True], key=lambda col: col.str.lower() if col.dtype == "object" else col, ) .reset_index(drop=True) ) return result -
# Write your MySQL query statement below WITH top_students AS ( SELECT user_id FROM course_completions GROUP BY user_id HAVING COUNT(1) >= 5 AND AVG(course_rating) >= 4 ), course_pairs AS ( SELECT course_name AS first_course, LEAD(course_name) OVER ( PARTITION BY user_id ORDER BY completion_date ) second_course FROM top_students JOIN course_completions USING (user_id) ) SELECT *, COUNT(1) transition_count FROM course_pairs WHERE second_course IS NOT NULL GROUP BY 1, 2 ORDER BY 3 DESC, 1, 2;