Welcome to Subscribe On Youtube

3764. Most Common Course Pairs

Description

Table: course_completions

+-++
\| user_id           \| int     \|
\| course_id         \| int     \|
\| course_name       \| varchar \|
\| completion_date   \| date    \|
\| course_rating     \| int     \|
+-++--++--++--+---+

Explanation:

  • User 1: Completed 6 courses with average rating 4.5 (qualifies as top performer)
  • User 2: Completed 5 courses with average rating 4.4 (qualifies as top performer)
  • User 3: Completed 5 courses but average rating is 2.8 (does not qualify)
  • User 4: Completed only 3 courses (does not qualify)
  • Course Pairs Among Top Performers:
    • User 1: Python Basics → SQL Fundamentals → JavaScript → React Basics → Node.js → Docker
    • User 2: Python Basics → React Basics → Node.js → Docker → AWS Fundamentals
    • Most common transitions: Node.js → Docker (2 times), React Basics → Node.js (2 times)

Results are ordered by transition_count in descending order, then by first_course in ascending order, and then by second_course in ascending order.

</div>

Solutions

Solution 1: Grouping and Counting

We first filter out all top students, denoted as top_students, i.e., students who have completed at least 5 courses with an average rating of at least 4. Then for each top student, we sort by completion time and find all consecutive course pairs, denoted as course_pairs. Finally, we group and count all course pairs, calculate the occurrence count of each course pair, and output the results sorted as required.

  • import pandas as pd
    
    
    def topLearnerCourseTransitions(course_completions: pd.DataFrame) -> pd.DataFrame:
        grp = course_completions.groupby("user_id")
        top_students = grp.filter(
            lambda df: df.shape[0] >= 5 and df["course_rating"].mean() >= 4
        )["user_id"].unique()
    
        df = course_completions[course_completions["user_id"].isin(top_students)].copy()
        df = df.sort_values(["user_id", "completion_date"])
        df["second_course"] = df.groupby("user_id")["course_name"].shift(-1)
        df["first_course"] = df["course_name"]
    
        pairs = df[df["second_course"].notna()][["first_course", "second_course"]]
    
        result = (
            pairs.groupby(["first_course", "second_course"])
            .size()
            .reset_index(name="transition_count")
            .sort_values(
                ["transition_count", "first_course", "second_course"],
                ascending=[False, True, True],
                key=lambda col: col.str.lower() if col.dtype == "object" else col,
            )
            .reset_index(drop=True)
        )
    
        return result
    
    
  • # Write your MySQL query statement below
    WITH
        top_students AS (
            SELECT user_id
            FROM course_completions
            GROUP BY user_id
            HAVING COUNT(1) >= 5 AND AVG(course_rating) >= 4
        ),
        course_pairs AS (
            SELECT
                course_name AS first_course,
                LEAD(course_name) OVER (
                    PARTITION BY user_id
                    ORDER BY completion_date
                ) second_course
            FROM
                top_students
                JOIN course_completions USING (user_id)
        )
    SELECT
        *,
        COUNT(1) transition_count
    FROM course_pairs
    WHERE second_course IS NOT NULL
    GROUP BY 1, 2
    ORDER BY 3 DESC, 1, 2;
    
    

All Problems

All Solutions