Verifiers Integration #305

alt-glitch · 2026-01-09T08:54:54Z

PR Type

RL Environment PR - Complete Environment Snapshot & Zero-Training sections
Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

Adding verifiers server to atropos to be able to use Prime Intellect's Environments with Atropos.

Completes #258

verifiers_server.py: Supports serve (RL training), process (SFT data generation with any API), and evaluate modes
verifiers_eval.py: Standalone evaluation environment with detailed metrics and retry logic
Automatically loads system prompts, rubrics, and datasets from Prime environments

Type of Change

New feature (non-breaking change which adds functionality)

🔖 Environment Snapshot

Field	Your Entry
Environment Name	Prime Intellect `verifiers`
Dataset Needed?	No
External Deps	`verifiers`
Environment/CLI Args required	`vf_env_name`; `env_args`. These specify the prime env slug + args those envs might need

Results

Eval run

W&B: https://wandb.ai/sidbin/atropos-environments/runs/1aowtj75/overview?nw=nwusersidbin

Command

python environments/eval_environments/verifiers_eval.py evaluate \
      --env.vf_env_name primeintellect/gsm8k \
      --env.max_eval_items 250 \
      --openai.model_name gpt-4.1-nano \
      --openai.api_key $OPENAI_API_KEY

SFT Datagen

W&B https://wandb.ai/sidbin/atropos-environments/runs/224hu4te?nw=nwusersidbin

Command

python environments/verifiers_server.py process \
    --env.vf_env_name primeintellect/gsm8k \
    --env.data_path_to_save_groups gpt-4.1-nano-gsm8k-sft-dataset \
    --openai.base_url https://api.openai.com/v1 \
    --openai.api_key $OPENAI_API_KEY \
    --env.use_wandb false \
    --env.total_steps 100 \
    --env.group_size 10 \
    --env.use_wandb true

Multi-turn Environment:

W&B
https://wandb.ai/sidbin/atropos-environments/runs/dotetook?nw=nwusersidbin

Command

python environments/verifiers_server.py process \
      --env.vf_env_name primeintellect/mini-swe-agent-plus \
      --env.data_path_to_save_groups gpt-5.2-swe-agent-sft-dataset \
      --openai.base_url https://api.openai.com/v1 \
      --openai.api_key $OPENAI_API_KEY \
      --openai.model_name gpt-5.2 \
      --env.total_steps 5 \
      --env.group_size 2 \
      --env.use_wandb true

✅ Developer & Reviewer Checklist

Code follows project style (black, isort, flake8 pass with pre-commit)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing unit tests pass locally with my changes
Docstrings added for all new public classes / functions
If .env vars required, did you add it to the .env.example in repo root?

- verifiers_server.py: consistent dataset column selection for train/test, remove redundant comments, preserve float precision for scores - verifiers_eval.py: add env_config_cls, fix constructor signature to match BaseEnv (slurm bool), make stub methods raise NotImplementedError

alt-glitch · 2026-01-10T02:22:25Z

paging @teknium1 for a quick review!

alt-glitch · 2026-01-10T07:48:07Z

Adding multi-turn runs using primeintellect/mini-swe-agent-plus.

Command

python environments/verifiers_server.py process \
    --env.vf_env_name primeintellect/mini-swe-agent-plus \
    --env.data_path_to_save_groups gpt-5.2-swe-agent-sft-dataset \
    --openai.base_url https://api.openai.com/v1 \
    --openai.api_key $OPENAI_API_KEY \
    --openai.model_name gpt-5.2 \
    --env.total_steps 5 \
    --env.group_size 1 \
    --env.use_wandb true

W&B Run: https://wandb.ai/sidbin/atropos-environments/runs/r2115ttr?nw=nwusersidbin
The run has passing and failing examples both, verifying multi-turn envs work.

alt-glitch · 2026-01-10T07:57:08Z

Marking as draft while I finalize the PR

alt-glitch · 2026-01-10T09:28:45Z

Added multi-turn rollout using process.

Ready for review @teknium1

for more information, see https://pre-commit.ci

verifiers

for more information, see https://pre-commit.ci

ManagedServer contexts for RL

for more information, see https://pre-commit.ci

environments/verifiers_server.py

environments/eval_environments/verifiers_eval.py

alt-glitch added 6 commits January 9, 2026 14:21

wip: verifiers integration

ed826de

make verifiers deps optional and update README

b62c416

fix docstrings

dda8543

add wandb to eval

636715b

update readme, add sft-datagen to verifiers_server

5b09ad8

alt-glitch changed the title ~~[WIP]: Verifiers Integration~~ Verifiers Integration Jan 9, 2026

alt-glitch marked this pull request as ready for review January 9, 2026 14:00

alt-glitch marked this pull request as draft January 10, 2026 07:56

alt-glitch marked this pull request as ready for review January 10, 2026 09:27

pre-commit-ci bot and others added 8 commits January 12, 2026 10:34

[pre-commit.ci] auto fixes from pre-commit.com hooks

3449a4c

for more information, see https://pre-commit.ci

rework server and eval for rl rollout. add in asyncmanagedserver for

cf63659

verifiers

add tests for AtroposManagedClient

294b980

[pre-commit.ci] auto fixes from pre-commit.com hooks

d98bc6d

for more information, see https://pre-commit.ci

clean up eval, pin verifiers version

24b4488

parallelize verifiers_server: use generate() for SFT, parallel

dceb1d8

ManagedServer contexts for RL

added better wandb logging

9db6c0d

fix verifiers conflict

4968730

alt-glitch force-pushed the sid/verifiers branch from 081ab8d to 4968730 Compare January 12, 2026 05:04

pre-commit-ci bot and others added 2 commits January 12, 2026 05:05

[pre-commit.ci] auto fixes from pre-commit.com hooks

7907ffd

for more information, see https://pre-commit.ci

fix env_args, dataset/prompt loading

a1d1e7d

teknium1 mentioned this pull request Jan 13, 2026

verifiers env #258

Draft

17 tasks

update verifiers_server to use tokenizer_for_trainer

3232051

dmahan93 requested changes Jan 13, 2026

View reviewed changes

environments/verifiers_server.py Outdated Show resolved Hide resolved

alt-glitch added 2 commits January 14, 2026 17:09

use managed server

6a27e88

remove unused managed_server wrapper + tese

57fa229

alt-glitch requested a review from dmahan93 January 14, 2026 11:41

dmahan93 requested changes Jan 15, 2026

View reviewed changes

environments/eval_environments/verifiers_eval.py Show resolved Hide resolved

alt-glitch and others added 2 commits January 15, 2026 11:34

switch to evalbase for verifiers_eval.py

c56af35

Merge branch 'main' into sid/verifiers

7f28c52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Verifiers Integration #305

Verifiers Integration #305

Uh oh!

alt-glitch commented Jan 9, 2026 •

edited

Loading

Uh oh!

alt-glitch commented Jan 10, 2026

Uh oh!

alt-glitch commented Jan 10, 2026 •

edited

Loading

Uh oh!

alt-glitch commented Jan 10, 2026

Uh oh!

alt-glitch commented Jan 10, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Verifiers Integration #305

Are you sure you want to change the base?

Verifiers Integration #305

Uh oh!

Conversation

alt-glitch commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

📝 General Information

Description

Type of Change

🔖 Environment Snapshot

Results

Eval run

SFT Datagen

Multi-turn Environment:

✅ Developer & Reviewer Checklist

Uh oh!

alt-glitch commented Jan 10, 2026

Uh oh!

alt-glitch commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alt-glitch commented Jan 10, 2026

Uh oh!

alt-glitch commented Jan 10, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alt-glitch commented Jan 9, 2026 •

edited

Loading

alt-glitch commented Jan 10, 2026 •

edited

Loading