Skip to content

Conversation

@tareknaser
Copy link
Collaborator

Description

As discussed, this PR adds a ci workflow to run experimental evaluations on new exam submissions to help contributors validate their exams before merging.

It's triggered manually with workflow dispatch on PRs targeting main

It detects new/modified exams and runs evaluation using anthropic/claude-haiku-4-5 for both testing and LLM-as-judge then adds comment on the PR with results and instructions to inspect the full results

I tested this workflow on a private repository

Signed-off-by: Tarek <tareknaser360@gmail.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a GitHub Actions workflow that enables experimental validation of courseexam submissions. The workflow is manually triggered via workflow_dispatch and runs evaluations on new or modified exam files using Claude Haiku 4.5 for both testing and judging, then posts results as PR comments.

Changes:

  • Adds a new GitHub Actions workflow for experimental courseexam validation
  • Implements automatic detection of new/modified exams in PRs
  • Provides automated feedback via PR comments with evaluation results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants