Bounty verifiers #307

banga-agents · 2026-01-09T13:29:32Z

PR Type

RL Environment PR - Complete Environment Snapshot & Zero-Training sections

📝 General Information

Description

Complete implementation of the VerifiersEnv adapter for Prime Intellect's Environment Hub, enabling Atropos to run environments installed via the verifiers library and Prime CLI.

Key Features:

Full adapter with collect_trajectories(), score(), evaluate(), wandb_log()
RubricGroup support for multi-rubric environments (like wordle)
Optional dependency - clean ImportError if verifiers not installed
26 unit tests - CI passes without Prime credentials
Comprehensive documentation in environments/README.md

This PR completes and extends the work started in #258.

🔖 Environment Snapshot

Field	Your Entry
Environment Name	VerifiersEnv (Prime Intellect Hub Adapter)
Short Description	Adapter to run Prime Intellect Environment Hub environments in Atropos via the `verifiers` library.
Category	Verifiable-Reasoning
Dataset Needed?	No (datasets provided by Prime Hub environments)
External Deps	`verifiers` (optional), Prime CLI (`uv tool install prime`)
Environmental Variables	`OPENAI_API_KEY` (for LLM inference)
Compute Footprint Estimate	<1 GB RAM, depends on loaded environment

🧪 Zero-Training Test Results

Details

W&B Link: N/A (tested locally)

Verification Test with will/wordle environment:

✓ rubric type: RubricGroup
✓ has rubrics attr: True
✓ num rubrics: 2
✓ reward_funcs: ['__score_rollout__']
✓ using score_rollout: True
✓ Training items: 2000
✓ Eval items: 20

Unit Test Results:

======================== 26 passed, 3 skipped in 7.41s =========================

Examples of the Environment scoring:

The adapter successfully:

Loads environments from Prime Hub via vf.load_environment()
Extracts rubrics (single or RubricGroup)
Normalizes datasets to Atropos format
Provides scoring via score_rollout() or reward functions

✅ Developer & Reviewer Checklist

Code follows project style (black, isort, flake8 pass with pre-commit)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing unit tests pass locally with my changes
Docstrings added for all new public classes / functions
If .env vars required, did you add it to the .env.example in repo root?

Verification Steps for Maintainers

# 1. Install dependencies
pip install verifiers
uv tool install prime

# 2. Login to Prime
prime login

# 3. Install test environment
prime env install will/wordle

# 4. Test the adapter
python -c "
import asyncio
from environments.verifiers_server import VerifiersEnv, VfEnvConfig
from atroposlib.envs.base import APIServerConfig

config = VfEnvConfig(vf_env_name='wordle')
env = VerifiersEnv(config, [APIServerConfig(model_name='test')])
asyncio.run(env.setup())
print(f'Train: {len(env.train)}, Eval: {len(env.test)}')
"
# Expected: Train: 2000, Eval: 20

for more information, see https://pre-commit.ci

## Summary Complete implementation of VerifiersEnv adapter for Prime Intellect's Environment Hub, enabling Atropos to use environments installed via the 'verifiers' library and Prime CLI. ## Changes - environments/verifiers_server.py: Complete rewrite (~580 lines) - Implement collect_trajectories() for RL training - Implement score() for batch scoring with rubrics - Implement evaluate() for model evaluation - Add comprehensive error handling and validation - Add dataset normalization for various field formats - Add proper async method signatures - Add wandb_log() for metrics tracking - atroposlib/tests/test_verifiers_env.py: New test file (26 tests) - Configuration tests - Import guard tests - Initialization validation tests - Dataset normalization tests - Reward calculation tests - Async signature verification - Class attribute verification - Prime integration tests (skipped in CI) - atroposlib/tests/conftest.py: Add @pytest.mark.prime marker - Skip Prime tests unless --runprime flag is passed - CI passes without Prime credentials - environments/README.md: Add documentation - Prerequisites and installation - Configuration options - Usage examples - Troubleshooting guide ## Testing - 117 tests pass, 6 skipped (Prime integration tests) - All existing tests continue to pass - CI-compatible (no Prime credentials required)

- Enhanced _setup_rubric() to detect and handle RubricGroup - Extract reward functions from each rubric in the group - Use score_rollout() when available (RubricGroup pattern) - Added self.rubrics list for individual rubric access - Improved logging for rubric detection Tested with will/wordle environment which uses RubricGroup: - rubric type: RubricGroup - num rubrics: 2 - using score_rollout: True - 2000 training items, 20 eval items loaded

for more information, see https://pre-commit.ci

cdreetz and others added 9 commits January 9, 2026 15:36

verifiers env

bf139d9

verifiers evaluate

3b91348

[pre-commit.ci] auto fixes from pre-commit.com hooks

4ded79d

for more information, see https://pre-commit.ci

fix config

78199c0

[pre-commit.ci] auto fixes from pre-commit.com hooks

f4cab9f

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

ee1ea66

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

6e110f1

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bounty verifiers #307

Bounty verifiers #307

Uh oh!

banga-agents commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Bounty verifiers #307

Are you sure you want to change the base?

Bounty verifiers #307

Uh oh!

Conversation

banga-agents commented Jan 9, 2026

PR Type

📝 General Information

Description

🔖 Environment Snapshot

🧪 Zero-Training Test Results

✅ Developer & Reviewer Checklist

Verification Steps for Maintainers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants