Skip to content

Conversation

@shannonsands
Copy link
Contributor

PR Type

  • RL Environment PR - Complete Environment Snapshot & Zero-Training sections
  • Non-Environment PR - Complete Description, Related Issues & Type of Change sections

📝 General Information

Description

RLVR on number captchas

🔖 Environment Snapshot

Field Your Entry
Environment Name CaptchaEnv
Short Description RLVR on numeric captcha dataset
Category Verifiable-Reasoning
Dataset Needed? Yes: https://huggingface.co/datasets/project-sloth/captcha-images, WTFPL
External Deps None (covered by other multimodal envs)
Environmental Variables
Compute Footprint Estimate 2 nodes works on small models

🧪 Zero-Training Test Results

Details

W&B Link:

Examples of the Environment scoring a good example and a bad example:


✅ Developer & Reviewer Checklist

  • Code follows project style (black, isort, flake8 pass with pre-commit)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes
  • Docstrings added for all new public classes / functions
  • If .env vars required, did you add it to the .env.example in repo root?

@shannonsands
Copy link
Contributor Author

Needs testing on cluster

@shannonsands shannonsands marked this pull request as ready for review May 21, 2025 22:02
@shannonsands shannonsands requested a review from hjc-puro May 21, 2025 22:02
@teknium1 teknium1 closed this Dec 26, 2025
@teknium1 teknium1 reopened this Dec 26, 2025
@teknium1
Copy link
Collaborator

@shannonsands is this ready to merge xD

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants