-
Notifications
You must be signed in to change notification settings - Fork 210
Integrate Prime Intellect Env Hub #304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
- Add collect_trajectories method for training loop - Add score method for reward assignment using Verifiers rubric - Add wandb_log override for tracking percent_correct - Move verifiers from core to optional dependencies - Add name class attribute for CLI integration - Replace debug main() with proper CLI entry point - Fix unused imports and add required imports (Item, tokenize_for_trainer) - Add percent_correct_buffer for training metrics Co-Authored-By: Claude <noreply@anthropic.com>
Tests cover: - VfEnvConfig configuration and inheritance - VerifiersEnv initialization and config_init - setup() method dataset loading - get_next_item() iteration and wrapping - score() method with rubric integration - collect_trajectories() API call generation - wandb_log() percent_correct calculation - evaluate() test set evaluation - Full training loop integration test All 15 tests pass with mocked verifiers library. Co-Authored-By: Claude <noreply@anthropic.com>
5bc19c5 to
c00f126
Compare
- Update to use _get_reward_funcs() and _get_reward_weights() (private API) - Add _call_reward_func helper to call reward functions with correct signatures - Relax math-verify constraint from ==0.7.0 to >=0.7.0 for compatibility - Update tests to mock reward functions correctly - Tested with PrimeIntellect wordle environment The verifiers library >= 0.1.9 changed their public API: - get_reward_funcs() -> _get_reward_funcs() - get_reward_weights() -> _get_reward_weights() - call_reward_func() removed, reward functions called directly Co-Authored-By: Claude <noreply@anthropic.com>
- Add import guard for verifiers optional dependency - Add compatibility layer for public/private API (verifiers >= 0.1.9) - Fix response_messages to use list instead of tuple - Add logging for exception handlers in reward functions - Make reward_threshold configurable in VfEnvConfig - Normalize messages to list before tokenize_for_trainer - Add defensive checks for completion response (empty choices) - Update tests for new reward_threshold config Co-Authored-By: Claude <noreply@anthropic.com>
When a verifiers environment is not found, automatically attempt to install it using the prime CLI. This makes it easier to use environments without manual installation steps. Co-Authored-By: Claude <noreply@anthropic.com>
then Screen.Recording.2026-01-09.at.1.14.26.AM.mov |
- Parse env_name to extract module name for loading (e.g., "will/wordle" -> "wordle") - Use full name for prime CLI install, module name for vf.load_environment - Add helpful error message when short format is used without install Co-Authored-By: Claude <noreply@anthropic.com>
|
pinging @cdreetz for visibility as well |
|
@erikqu can you look into sft-datagen to see if you can gen data with 4.1 that gets verified by the env your using from prime hub in the meantime |
- Detect multi-turn envs via hasattr(vf_env, "env_response") - Add _collect_multi_turn_trajectories using vf_env.rollout() - Add _rollout_and_score_eval_multi_turn for evaluation - Use rubric.score_rollout(state) for proper multi-turn scoring - Add _get_vf_client() for AsyncOpenAI client creation - Use pre-computed reward from state for multi-turn in score() - Add timeout handling (120s) for multi-turn rollouts - Improve logging for debugging API issues Multi-turn environments like wordle now use the verifiers rollout mechanism which properly handles the interaction loop and state-aware reward functions. Co-Authored-By: Claude <noreply@anthropic.com>
The verifiers rollout() expects a RolloutInput TypedDict with: - prompt (messages list) - example_id (int) - task (str) - answer (str) - info (dict) Was incorrectly passing 'question' field instead of the correct structure. Also added WARNING level logs for better debugging visibility. Co-Authored-By: Claude <noreply@anthropic.com>
|
put this back into a draft, bugs with sft |
for more information, see https://pre-commit.ci
- Use real functions instead of MagicMock for reward functions to ensure proper signature inspection - Add vf_env attribute with correct class name for single-turn detection - Set ensure_scores_are_not_same config in tests Co-Authored-By: Claude <noreply@anthropic.com>
for more information, see https://pre-commit.ci
|
Wait I meant doing an environment from Verifiers (not atropos), when you run sft-datagen set a wandb run name and login to wandb, and then share that run please |
but sure I can share one later! |
@teknium1 can't seem to share the wandb project, maybe dm me your email or something |

PR Type
📝 General Information
Description
This PR completes the verifiers environment implementation started in #258 by adding the missing training methods and fixing issues identified in review.
Related Issues
Supersedes/completes #258
Changes
environments/verifiers_server.pycollect_trajectoriesmethod for the training loop - generates multiple completions per questionscoremethod for reward assignment using the Verifiers rubric systemwandb_logoverride for trackingtrain/percent_correctmetricnameclass attribute for proper CLI integrationpercent_correct_bufferfor training metrics trackingItem,tokenize_for_trainer,random; removed unused importsmain()function with properVerifiersEnv.cli()entry pointpyproject.tomlverifiers>=0.1.5.post0from core dependencies to optional[verifiers]group (install withpip install atroposlib[verifiers])🔖 Environment Snapshot
prime env install will/wordle)verifiers>=0.1.5.post0(optional)OPENAI_API_KEY✅ Developer & Reviewer Checklist
🤖 Generated with Claude Code
PR Link: #304