GitHub Actions workflows, composite actions, and scripts that power the EE-Bench datapoint pipeline. All automation lives in this repository (dpaia/infrastructure) and operates on source PRs in dpaia/* repos and the dataset repository (dpaia/dataset).
The pipeline turns source PRs into validated datapoints through five stages, tracked via two GitHub Projects:
- Eval Type projects (one per eval type, e.g. "Code Generation") — track source PRs through Review → Verified → Done
- Dataset Metadata project — tracks generated datapoint PRs through In Progress → Done
Source PR in dpaia/* repo
│ Team marks status → Review
▼
┌─────────────────────────────────────────┐
│ Stage 1: Verify Source │
│ verify-source_v2.yml │
│ Generate datapoint + run validation │
│ Post check result on source PR │
└──────────────┬──────────────────────────┘
│ Review team sets status → Verified
▼
┌─────────────────────────────────────────┐
│ Stage 2: Generate Datapoint │
│ generate-datapoint_v2.yml │
│ Create PR in dpaia/dataset │
│ Track in Dataset Metadata project │
└──────────────┬──────────────────────────┘
│ Dataset PR created
▼
┌─────────────────────────────────────────┐
│ Stage 3: Validate Datapoint │
│ validate-datapoint_v2.yml │
│ Run validation on dataset PR contents │
│ Post result comment on dataset PR │
└──────────────┬──────────────────────────┘
│ Dataset PR merged
▼
┌─────────────────────────────────────────┐
│ Stage 4: Post-Merge │
│ on-datapoint-merged_v2.yml │
│ Set both projects to Done │
│ Close source PR │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Export (on demand) │
│ export-dataset-v2.yml │
│ Bulk export datapoints as folders/JSONL │
└─────────────────────────────────────────┘
┌─────────────────────────────────────────┐
│ Sweep (scheduled) │
│ sweep-pipeline-v2.yml │
│ Detect and repair pipeline gaps │
└─────────────────────────────────────────┘
| Guide | Audience | Description |
|---|---|---|
| Contribution Guide | Developers | How to create source PRs with .ee-bench/ configuration |
| Code Review Guide | Review team | How to validate datapoints and manage the project board |
| Evaluation Guide | Evaluation team | How to export datasets and run validation |
Datapoints in dpaia/dataset are organized as <eval_type>/<repo>/<instance_id>:
dpaia/dataset/
├── codegen/
│ ├── spectre.console/
│ │ ├── spectreconsole__spectre.console-1708.json # flat JSON (all content inlined)
│ │ └── spectreconsole__spectre.console-1708/ # structured folder
│ │ ├── datapoint.json
│ │ ├── environment/
│ │ ├── eval/
│ │ └── verify/
│ └── spring-boot-microshop/
│ └── ...
└── debugging/
└── ...
File: .github/workflows/verify-source_v2.yml
Purpose: Generates a datapoint from a source PR and runs validation to verify the PR is suitable for the benchmark.
Trigger:
- Automatic: Bot dispatches when source PR status changes to "Review" in an eval type project
- Manual:
workflow_dispatch
Inputs: organization, repository, pr_number, eval_type, run_key, eval_project_number
What it does:
- If
eval_project_numberis set: finds the PR in the eval project and sets Verification="Pending" - Runs the export script to generate a datapoint from the source PR
- Runs
validate.shagainst the generated datapoint - If
eval_project_numberis set: sets Verification="Passed" or "Failed" based on the result - Posts a comment on the source PR with pass/fail result, test summary, and failed test details
- Uploads validation logs and result JSON as artifacts
File: .github/workflows/generate-datapoint_v2.yml
Purpose: Generates a datapoint from a verified source PR and creates a PR in the dataset repository.
Trigger:
- Automatic: Bot dispatches when source PR status changes to "Verified" in an eval type project
- Manual:
workflow_dispatch
Inputs: organization, repository, pr_number, eval_type, run_key, dataset_repo, export_script, dataset_project_number
What it does:
- Runs the export script to generate a datapoint (flat JSON + structured folder)
- Creates/updates a PR in
dpaia/datasetwith the datapoint files under<eval_type>/<repo>/<instance_id> - Adds the source PR to the Dataset Metadata project with status "In Progress"
- Increments the Version field and records the source commit SHA
- Posts a comment on the source PR linking to the dataset PR
File: .github/workflows/validate-datapoint_v2.yml
Purpose: Validates a datapoint PR in the dataset repository.
Trigger:
- Automatic: Bot dispatches when a PR is opened/updated in
dpaia/dataset - Manual:
workflow_dispatch
Inputs: organization, repository, pr_number, eval_type, run_key, source_organization, source_repository, source_pr_number, dataset_project_number
What it does:
- Checks out the dataset PR branch
- Detects the instance directory from changed files
- Runs
validate.shagainst the datapoint - On success: auto-merges the dataset PR (squash merge with mergeability retry loop)
- On failure +
dataset_project_numberset: finds the source PR in the Dataset Metadata project and sets Status="Failed" - Posts a comment on the dataset PR with pass/fail result
- Uploads validation logs as artifacts
File: .github/workflows/on-datapoint-merged_v2.yml
Purpose: Finalizes the pipeline after a dataset PR is merged — updates project statuses and closes the source PR.
Trigger:
- Automatic: Bot dispatches when a PR is merged in
dpaia/dataset - Manual:
workflow_dispatch
Inputs: organization, repository, pr_number, run_key, eval_project_map, dataset_project_number
What it does:
- Parses metadata from the merged dataset PR body (instance_id, eval_type, source_pr, source_repo)
- Sets the Dataset Metadata project status to "Done" and updates the Data field with a permalink
- Sets the eval type project status to "Done" for the source PR
- Closes the source PR with a comment (not merged)
File: .github/workflows/export-dataset-v2.yml
Purpose: Bulk exports datapoints from the dataset repository as a downloadable artifact.
Trigger: Manual (workflow_dispatch)
Inputs: eval_type, search_query, format (folders or jsonl), output_name, organization, dataset_repo
What it does:
- Checks out the dataset repository
- Resolves instance IDs — either from a GitHub PR search query or by scanning the filesystem
- Exports each instance as folder or JSONL format
- Generates a manifest with metadata (eval type, count, commit SHA, timestamps)
- Uploads the export as a GitHub Actions artifact (retained 30 days)
File: .github/workflows/sweep-pipeline-v2.yml
Script: .github/python/sweep_pipeline.py
Purpose: Detects and repairs inconsistencies in the pipeline state. Acts as a safety net for dropped events, bot restarts, failed workflows, and manual board edits that leave the pipeline in an inconsistent state.
Trigger:
- Scheduled: Every 6 hours (
0 */6 * * *) - On bot startup: The issue-validator-bot dispatches a sweep on every deploy/restart to catch state lost from in-memory persistence
- Manual:
workflow_dispatch
How it works:
- Queries all eval type projects and the Dataset Metadata project via GraphQL (with pagination)
- Runs each consistency check against the queried items
- For each inconsistency found, either repairs it directly (API call) or dispatches the appropriate workflow
- Produces a JSON summary artifact with all issues found and repairs made
Consistency checks:
| Check | Inconsistent State | How Detected | Repair |
|---|---|---|---|
| Missing verification | Source PR in "Review" with no "Datapoint Verification" check run on HEAD | Query check runs for the PR's head SHA | Dispatch verify-source_v2.yml |
| Closed PR in active status | Source PR is closed but project status is "In Progress", "Review", or "Verified" | Compare PR state from GraphQL against project Status field | Reopen the PR via REST API (skip if merged — can't reopen) |
| Verified without Verification=Passed | Source PR in "Verified" but Verification field is not "Passed" | Field value mismatch | Report only — needs manual investigation (possible guard bypass) |
| Verified with closed source PR | Source PR in "Verified" but PR is closed/merged | PR state check | Reopen if closed (not merged); report if merged |
| Verified without dataset PR | Source PR in "Verified" but no dataset PR exists in the Dataset Metadata project | Cross-reference eval items against dataset items by source PR URL | Dispatch generate-datapoint_v2.yml |
| Verified with all dataset PRs closed | Source PR in "Verified" but all linked dataset PRs are closed (not merged) | Dataset PR state check — indicates failed generation | Dispatch generate-datapoint_v2.yml (re-generation) |
| Merged dataset PR not Done | Dataset PR is merged but project status is not "Done" | Compare PR state against Status field | Dispatch on-datapoint-merged_v2.yml |
| Stale check runs | Pipeline check run ("Datapoint Verification", "Datapoint Generation", or "Datapoint Validation") stuck in_progress for over 1 hour |
Query check runs on active PRs, compare started_at to current time |
PATCH check to completed with conclusion timed_out |
API considerations:
- Status filtering limits check-runs queries to PRs in active statuses only (~10-30 API calls instead of scanning all items)
- SHA deduplication avoids redundant queries when multiple project items reference the same commit
- Rate-limited API helper (
gh_rate_limited) adds 100ms inter-call delay and retries on rate limit responses, reading the actual wait period fromRetry-AfterorX-RateLimit-Resetheaders - Paginated check-runs queries via
gh api --paginate --slurp
Inputs: dry_run (detect only, no repairs), eval_projects (JSON map of eval_type to project_number), dataset_project_number, organization
File: .github/workflows/sync-project-fields_v2.yml
Purpose: Performs project field mutations (clear verification, reopen PR, reset status) dispatched by the bot when project status changes or new commits arrive.
Trigger:
- Automatic: Bot dispatches on status regression from Verified/Done or on source PR synchronize
- Manual:
workflow_dispatch
Inputs: organization, repository, pr_number, operation, eval_project_number, run_key
Operations:
clear-verification: Clears the Verification field on the eval project itemreopen-pr: Reopens the PR if closed (not merged) and clears Verificationreset-on-sync: Sets Status="In progress", Verification="Pending", and posts an informational comment on the PR
| Action | Purpose |
|---|---|
parse-pr-url |
Extracts owner, repo, and number from a GitHub PR URL |
get-pr-node-id |
Gets the GraphQL node ID for a pull request |
get-issue-node-id |
Gets the GraphQL node ID for an issue |
get-project-id |
Fetches a GitHub Project V2 ID by organization and number |
add-issue-to-project |
Adds an issue/PR to a GitHub Project V2 |
find-pr-in-project |
Resolves PR node ID, project ID, and adds PR to project (combines get-pr-node-id + get-project-id + add-issue-to-project) |
set-project-status |
Sets the Status field on a project item |
update-project-field |
Updates any field on a project item (text, number, single-select) |
clear-project-field |
Clears a field value on a project item |
get-project-field-value |
Reads a field value from a project item |
query-project-items |
Queries all items in a project with their field values |
run-export-script |
Resolves and runs an export script to generate a datapoint |
run-validation |
Runs validate.sh and extracts structured results (status, test summary, failures) |
create-datapoint-pr |
Creates/updates a PR in the dataset repo with datapoint files and metadata |
setup-ee-import |
Installs the ee-dataset CLI from the bundled wheel |
The following workflows predate the v2 pipeline and are kept for backward compatibility during migration. They should not be used for new work.
| Workflow | File | Purpose |
|---|---|---|
| Update Issue Dataset Data | update-issue-data.yml |
Generate a single datapoint from an issue |
| Generate Dataset Data | generate-dataset-data.yml |
Batch generate datapoints from issues matching a search query |
| Export Dataset | export-dataset.yml |
Export dataset to dpaia/ee-dataset repository |
| Validate Issue | validate-issue.yml |
Validate a single issue's datapoint |
| Validate External Repo Issue | validate-external-repo-issue.yml |
Validate an issue from an external repository |
| Validate Dataset | validate-dataset.yml |
Validate dataset items |
| Process Issue | process-issue.yml |
Process a single issue through the pipeline |
| Workflow | File | Trigger | Purpose |
|---|---|---|---|
| Sync Labels | sync-labels.yml |
Manual | Synchronize issue labels across repositories |
| Add Issues to Project | add-issues-to-project.yml |
Manual / Daily | Add matching issues to a project board |
| Share Custom Workflows | share-custom-workflows.yml |
Manual | Distribute workflow files to other repositories via PRs |
These reusable workflows are called by workflows in other dpaia/* repositories:
| Workflow | File | Purpose |
|---|---|---|
| Collect and Process Tests | shared-collect-process-tests.yml |
Extract FAIL_TO_PASS / PASS_TO_PASS test lists from issues |
| Run Tests Maven | shared-run-tests-maven.yml |
Run Maven tests and report results on issues |
| Maven (shared template) | shared/.github/workflows/maven.yml |
Standalone Maven test workflow distributed to repositories |
| Component | Location | Purpose |
|---|---|---|
| issue-validator-bot | dpaia/issue-validator-bot |
Thin orchestrator: receives webhooks, validates/guards, dispatches v2 workflows, manages check runs |
| ee-bench-import | dpaia/ee-bench-import |
Export scripts, validation script, ee-dataset CLI source |
| Dataset repository | dpaia/dataset |
Stores generated datapoints |