Skip to content

Conversation

@PLippmann
Copy link
Contributor

PR Type

  • RL Environment PR - Complete Environment Snapshot & Zero-Training sections

📝 General Information

Description

This PR adds a NL2Bash Generation Environment for training LLMs to convert natural language instructions into executable Bash commands.

Key Features:

  • Uses the NL2SH-ALFA dataset (40k+ examples)
  • String matching verification - Commands are verified against gold standard commands (exact or normalized match)
  • Binary reward signal: 1.0 for correct matches, -1.0 for incorrect answers
  • Safe for training (no execution risks) while maintaining high correlation with correctness

🔖 Environment Snapshot

Field Your Entry
Environment Name bash_env
Short Description Train LLMs to generate Bash commands from natural language strings
Category Verifiable-Reasoning
Dataset Needed? Yes - westenfelder/NL2SH-ALFA (MIT License, auto-downloaded)
External Deps None
Environmental Variables None required
Compute Footprint Estimate <1 GB RAM, <1s string matching verification

🧪 Zero-Training Test Results

Click to expand test results

Unit Tests: 23/23 passing

$ python -m pytest test_bash_utils.py
============================= test session starts ==============================
collected 23 items
test_bash_utils.py .......................                               [100%]
======================== 23 passed in 0.07s ========================

LLM Integration Test: Qwen2.5-1.5B-Instruct on NVIDIA A100

============================================================
Bash Environment Integration Test
============================================================
Server: http://localhost:9001/v1 (vLLM)
Model: Qwen/Qwen2.5-1.5B-Instruct
WandB: https://wandb.ai/teateam/atropos-environments_community_bash_env?nw=g3bfzo5dcc4

Examples:

Good Example (Score: 1.0)

  • Instruction: "Launch yazi file manager"
  • Generated: yazi
  • Result: Matches correct command exactly (+1.0)

Bad Example (Score: -1.0)

  • Instruction: "To disconnect from Tailscale..."
  • Generated: tctl disconnect
  • Correct: tailscale down
  • Result: Mismatch (-1.0)

✅ Developer & Reviewer Checklist

  • Code follows project style (black, isort, flake8 pass with pre-commit)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes (23/23)
  • Docstrings added for all new public classes / functions
  • If .env vars required, did you add it to the .env.example in repo root?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant