Integrate Prime Intellect Env Hub #304

erikqu · 2026-01-09T08:13:09Z

PR Type

New feature (non-breaking change which adds functionality)
This change requires a documentation update

📝 General Information

Description

This PR completes the verifiers environment implementation started in #258 by adding the missing training methods and fixing issues identified in review.

Related Issues

Supersedes/completes #258

Changes

environments/verifiers_server.py

Added collect_trajectories method for the training loop - generates multiple completions per question
Added score method for reward assignment using the Verifiers rubric system
Added wandb_log override for tracking train/percent_correct metric
Added name class attribute for proper CLI integration
Added percent_correct_buffer for training metrics tracking
Fixed imports - added Item, tokenize_for_trainer, random; removed unused imports
Replaced debug main() function with proper VerifiersEnv.cli() entry point

pyproject.toml

Moved verifiers>=0.1.5.post0 from core dependencies to optional [verifiers] group (install with pip install atroposlib[verifiers])

🔖 Environment Snapshot

Field	Your Entry
Environment Name	verifiers
Short Description	Integration with Verifiers/Prime framework for structured reward evaluation
Category	RL Environment
Dataset Needed?	Via Verifiers (e.g., `prime env install will/wordle`)
External Deps	`verifiers>=0.1.5.post0` (optional)
Environmental Variables	`OPENAI_API_KEY`
Compute Footprint Estimate	Depends on underlying Verifiers environment

✅ Developer & Reviewer Checklist

Code follows project style (black, isort, flake8 pass with pre-commit)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
My changes generate no new warnings
Docstrings added for all new public classes / functions

🤖 Generated with Claude Code

PR Link: #304

for more information, see https://pre-commit.ci

…erifiers

for more information, see https://pre-commit.ci

- Add collect_trajectories method for training loop - Add score method for reward assignment using Verifiers rubric - Add wandb_log override for tracking percent_correct - Move verifiers from core to optional dependencies - Add name class attribute for CLI integration - Replace debug main() with proper CLI entry point - Fix unused imports and add required imports (Item, tokenize_for_trainer) - Add percent_correct_buffer for training metrics Co-Authored-By: Claude <noreply@anthropic.com>

erikqu · 2026-01-09T08:21:10Z

https://x.com/Teknium/status/2009501780149981557?s=20

Tests cover: - VfEnvConfig configuration and inheritance - VerifiersEnv initialization and config_init - setup() method dataset loading - get_next_item() iteration and wrapping - score() method with rubric integration - collect_trajectories() API call generation - wandb_log() percent_correct calculation - evaluate() test set evaluation - Full training loop integration test All 15 tests pass with mocked verifiers library. Co-Authored-By: Claude <noreply@anthropic.com>

- Update to use _get_reward_funcs() and _get_reward_weights() (private API) - Add _call_reward_func helper to call reward functions with correct signatures - Relax math-verify constraint from ==0.7.0 to >=0.7.0 for compatibility - Update tests to mock reward functions correctly - Tested with PrimeIntellect wordle environment The verifiers library >= 0.1.9 changed their public API: - get_reward_funcs() -> _get_reward_funcs() - get_reward_weights() -> _get_reward_weights() - call_reward_func() removed, reward functions called directly Co-Authored-By: Claude <noreply@anthropic.com>

- Add import guard for verifiers optional dependency - Add compatibility layer for public/private API (verifiers >= 0.1.9) - Fix response_messages to use list instead of tuple - Add logging for exception handlers in reward functions - Make reward_threshold configurable in VfEnvConfig - Normalize messages to list before tokenize_for_trainer - Add defensive checks for completion response (empty choices) - Update tests for new reward_threshold config Co-Authored-By: Claude <noreply@anthropic.com>

When a verifiers environment is not found, automatically attempt to install it using the prime CLI. This makes it easier to use environments without manual installation steps. Co-Authored-By: Claude <noreply@anthropic.com>

erikqu · 2026-01-09T09:08:53Z

assume you did prime login
prime intellect huv envs used as owner/env
e.g.

  uv run environments/verifiers_server.py serve \
    --env.vf_env_name will/wordle \
    --env.use_wandb False \
    --env.wandb_name wordle \
    --openai.api_key $OPENAI_API_KEY \
    --openai.model_name gpt-4.1-mini

then

Screen.Recording.2026-01-09.at.1.14.26.AM.mov

- Parse env_name to extract module name for loading (e.g., "will/wordle" -> "wordle") - Use full name for prime CLI install, module name for vf.load_environment - Add helpful error message when short format is used without install Co-Authored-By: Claude <noreply@anthropic.com>

teknium1 · 2026-01-09T09:20:50Z

pinging @cdreetz for visibility as well

teknium1 · 2026-01-09T09:27:59Z

@erikqu can you look into sft-datagen to see if you can gen data with 4.1 that gets verified by the env your using from prime hub in the meantime

- Detect multi-turn envs via hasattr(vf_env, "env_response") - Add _collect_multi_turn_trajectories using vf_env.rollout() - Add _rollout_and_score_eval_multi_turn for evaluation - Use rubric.score_rollout(state) for proper multi-turn scoring - Add _get_vf_client() for AsyncOpenAI client creation - Use pre-computed reward from state for multi-turn in score() - Add timeout handling (120s) for multi-turn rollouts - Improve logging for debugging API issues Multi-turn environments like wordle now use the verifiers rollout mechanism which properly handles the interaction loop and state-aware reward functions. Co-Authored-By: Claude <noreply@anthropic.com>

The verifiers rollout() expects a RolloutInput TypedDict with: - prompt (messages list) - example_id (int) - task (str) - answer (str) - info (dict) Was incorrectly passing 'question' field instead of the correct structure. Also added WARNING level logs for better debugging visibility. Co-Authored-By: Claude <noreply@anthropic.com>

erikqu · 2026-01-09T18:37:13Z

put this back into a draft, bugs with sft

for more information, see https://pre-commit.ci

erikqu · 2026-01-09T19:40:26Z

GSM8K example jsonl

https://www.dropbox.com/scl/fi/hlinn7fukd1o1gdgn1yla/sft_gsm_8k_data.jsonl?rlkey=ucqq2jz48nfah77khxmr96a2c&st=l0q2gswf&dl=0

- Use real functions instead of MagicMock for reward functions to ensure proper signature inspection - Add vf_env attribute with correct class name for single-turn detection - Set ensure_scores_are_not_same config in tests Co-Authored-By: Claude <noreply@anthropic.com>

for more information, see https://pre-commit.ci

teknium1 · 2026-01-10T01:48:30Z

Wait I meant doing an environment from Verifiers (not atropos), when you run sft-datagen set a wandb run name and login to wandb, and then share that run please

erikqu · 2026-01-10T01:53:53Z

Wait I meant doing an environment from Verifiers (not atropos), when you run sft-datagen set a wandb run name and login to wandb, and then share that run please

Oh sorry I didn't specify that it was with the prime intellect gsm8k, didn't think you guys obv also have it already, but just a test run e.g. (ignore wandb name)

erikqu · 2026-01-10T01:54:45Z

Wait I meant doing an environment from Verifiers (not atropos), when you run sft-datagen set a wandb run name and login to wandb, and then share that run please

but sure I can share one later!

erikqu · 2026-01-10T20:24:19Z

Wait I meant doing an environment from Verifiers (not atropos), when you run sft-datagen set a wandb run name and login to wandb, and then share that run please

but sure I can share one later!

@teknium1 can't seem to share the wandb project, maybe dm me your email or something

cdreetz and others added 9 commits October 9, 2025 23:48

verifiers env

8a02156

[pre-commit.ci] auto fixes from pre-commit.com hooks

5b2f860

for more information, see https://pre-commit.ci

verifiers evaluate

0bff846

fix

0a814b2

[pre-commit.ci] auto fixes from pre-commit.com hooks

28c41f6

for more information, see https://pre-commit.ci

fix config

6620f1b

Merge branch 'verifiers' of https://github.com/cdreetz/atropos into v…

78b8248

…erifiers

[pre-commit.ci] auto fixes from pre-commit.com hooks

e4b28d6

for more information, see https://pre-commit.ci

erikqu marked this pull request as draft January 9, 2026 08:18

erikqu force-pushed the verifiers-complete branch from 5bc19c5 to c00f126 Compare January 9, 2026 08:21

erikqu changed the title ~~Verifiers complete~~ Integrate Prime Intellect Env Hub Jan 9, 2026

erikqu and others added 4 commits January 9, 2026 00:40

Merge branch 'main' into verifiers-complete

34b91b4

Add auto-install for verifiers environments

14db5c6

When a verifiers environment is not found, automatically attempt to install it using the prime CLI. This makes it easier to use environments without manual installation steps. Co-Authored-By: Claude <noreply@anthropic.com>

erikqu marked this pull request as ready for review January 9, 2026 09:10

erikqu and others added 2 commits January 9, 2026 09:47

erikqu marked this pull request as draft January 9, 2026 18:37

Fix multi/single turn Prime envs

1bebbc0

erikqu marked this pull request as ready for review January 9, 2026 19:38

[pre-commit.ci] auto fixes from pre-commit.com hooks

4b9aef9

for more information, see https://pre-commit.ci

erikqu and others added 2 commits January 9, 2026 11:42

[pre-commit.ci] auto fixes from pre-commit.com hooks

e0d598f

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate Prime Intellect Env Hub #304

Integrate Prime Intellect Env Hub #304

Uh oh!

erikqu commented Jan 9, 2026 •

edited

Loading

Uh oh!

erikqu commented Jan 9, 2026

Uh oh!

erikqu commented Jan 9, 2026 •

edited

Loading

Uh oh!

teknium1 commented Jan 9, 2026

Uh oh!

teknium1 commented Jan 9, 2026

Uh oh!

erikqu commented Jan 9, 2026

Uh oh!

erikqu commented Jan 9, 2026

Uh oh!

teknium1 commented Jan 10, 2026

Uh oh!

erikqu commented Jan 10, 2026

Uh oh!

erikqu commented Jan 10, 2026

Uh oh!

erikqu commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Integrate Prime Intellect Env Hub #304

Are you sure you want to change the base?

Integrate Prime Intellect Env Hub #304

Uh oh!

Conversation

erikqu commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

📝 General Information

Description

Related Issues

Changes

🔖 Environment Snapshot

✅ Developer & Reviewer Checklist

Uh oh!

erikqu commented Jan 9, 2026

Uh oh!

erikqu commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

teknium1 commented Jan 9, 2026

Uh oh!

teknium1 commented Jan 9, 2026

Uh oh!

erikqu commented Jan 9, 2026

Uh oh!

erikqu commented Jan 9, 2026

Uh oh!

teknium1 commented Jan 10, 2026

Uh oh!

erikqu commented Jan 10, 2026

Uh oh!

erikqu commented Jan 10, 2026

Uh oh!

erikqu commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

erikqu commented Jan 9, 2026 •

edited

Loading

erikqu commented Jan 9, 2026 •

edited

Loading