Skip to content

Conversation

@carabistouflette
Copy link

PR Type

  • RL Environment PR - Complete Environment Snapshot & Zero-Training sections
  • Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

This PR continues the work started in PR #258 to create a Verifiers Environment (VerifiersEnv) that integrates the external verifiers library into the Atropos ecosystem.

Key Changes:

  • Refactored Architecture: Moved the environment implementation from environments/ to atroposlib/envs/ to align with the project's module structure.
  • Robust Error Handling: Added import guards for the optional verifiers dependency, defensive checks for empty API responses and None content, and division-by-zero protection for reward weight normalization.
  • W&B Tracking: Implemented percent_correct and cumulative accuracy metrics for Weights & Biases logging.
  • Configurable Thresholds: Added an optional reward_threshold config option for binary reward conversion.
  • Compatibility Layer: Created a _call_reward_func wrapper to handle API differences across verifiers versions.
  • Full Test Coverage: Added a comprehensive test suite (test_verifiers.py) covering initialization, trajectory collection, and item fetching.

Related Issues

Continues PR #258

Type of Change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

🔖 Environment Snapshot

Field Your Entry
Environment Name VerifiersEnv
Short Description Wrapper for loading and running RL tasks from the verifiers library ecosystem.
Category Verifiable-Reasoning
Dataset Needed? No (datasets provided by the verifiers library dynamically)
External Deps verifiers (pip install verifiers)
Environmental Variables None
Compute Footprint Estimate <1 GB RAM, standard CPU/GPU for inference

🧪 Zero-Training Test Results

Details

W&B Link: N/A - Requires external verifiers package with specific environment configuration.

Examples of the Environment scoring a good example and a bad example:
Tested via unit tests in test_verifiers.py. Mocked environment correctly scores valid responses with 1.0 and handles empty/error responses gracefully.


✅ Developer & Reviewer Checklist

  • Code follows project style (black, isort, flake8 pass with pre-commit)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes
  • Docstrings added for all new public classes / functions
  • If .env vars required, did you add it to the .env.example in repo root?

@carabistouflette
Copy link
Author

pre-commit.ci seems to be complaining about a comment in example_trainer/vllm_api_server.py
I didn't touch this file (check files changed) so it's safe to ignore it.

Updated verifiers dependency version in pyproject.toml.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants