`courseexam`: Add Experimental Validation Workflow #63

tareknaser · 2026-01-13T22:54:54Z

Description

As discussed, this PR adds a ci workflow to run experimental evaluations on new exam submissions to help contributors validate their exams before merging.

It's triggered manually with workflow dispatch on PRs targeting main

It detects new/modified exams and runs evaluation using anthropic/claude-haiku-4-5 for both testing and LLM-as-judge then adds comment on the PR with results and instructions to inspect the full results

I tested this workflow on a private repository

Signed-off-by: Tarek <tareknaser360@gmail.com>

Copilot

Pull request overview

This PR adds a GitHub Actions workflow that enables experimental validation of courseexam submissions. The workflow is manually triggered via workflow_dispatch and runs evaluations on new or modified exam files using Claude Haiku 4.5 for both testing and judging, then posts results as PR comments.

Changes:

Adds a new GitHub Actions workflow for experimental courseexam validation
Implements automatic detection of new/modified exams in PRs
Provides automated feedback via PR comments with evaluation results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

.github/workflows/validate-exam-pr.yml

feat(ci): add experimental courseexam benchmark validation workflow

2ce537b

Signed-off-by: Tarek <tareknaser360@gmail.com>

tareknaser requested review from Copilot and xuafeng January 13, 2026 22:54

Copilot started reviewing on behalf of tareknaser January 13, 2026 22:55 View session

Copilot AI reviewed Jan 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`courseexam`: Add Experimental Validation Workflow #63

`courseexam`: Add Experimental Validation Workflow #63

Uh oh!

tareknaser commented Jan 13, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

courseexam: Add Experimental Validation Workflow #63

Are you sure you want to change the base?

courseexam: Add Experimental Validation Workflow #63

Uh oh!

Conversation

tareknaser commented Jan 13, 2026

Description

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`courseexam`: Add Experimental Validation Workflow #63

`courseexam`: Add Experimental Validation Workflow #63