Natural language to bash environment #302

PLippmann · 2026-01-07T13:44:19Z

PR Type

RL Environment PR - Complete Environment Snapshot & Zero-Training sections

📝 General Information

Description

This PR adds a NL2Bash Generation Environment for training LLMs to convert natural language instructions into executable Bash commands.

Key Features:

Uses the NL2SH-ALFA dataset (40k+ examples)
String matching verification - Commands are verified against gold standard commands (exact or normalized match)
Binary reward signal: 1.0 for correct matches, -1.0 for incorrect answers
Safe for training (no execution risks) while maintaining high correlation with correctness

🔖 Environment Snapshot

Field	Your Entry
Environment Name	bash_env
Short Description	Train LLMs to generate Bash commands from natural language strings
Category	Verifiable-Reasoning
Dataset Needed?	Yes - westenfelder/NL2SH-ALFA (MIT License, auto-downloaded)
External Deps	None
Environmental Variables	None required
Compute Footprint Estimate	<1 GB RAM, <1s string matching verification

🧪 Zero-Training Test Results

Click to expand test results

Unit Tests: 23/23 passing

$ python -m pytest test_bash_utils.py
============================= test session starts ==============================
collected 23 items
test_bash_utils.py .......................                               [100%]
======================== 23 passed in 0.07s ========================

LLM Integration Test: Qwen2.5-1.5B-Instruct on NVIDIA A100

============================================================
Bash Environment Integration Test
============================================================
Server: http://localhost:9001/v1 (vLLM)
Model: Qwen/Qwen2.5-1.5B-Instruct
WandB: https://wandb.ai/teateam/atropos-environments_community_bash_env?nw=g3bfzo5dcc4

Examples:

✓ Good Example (Score: 1.0)

Instruction: "Launch yazi file manager"
Generated: yazi
Result: Matches correct command exactly (+1.0)

✗ Bad Example (Score: -1.0)

Instruction: "To disconnect from Tailscale..."
Generated: tctl disconnect
Correct: tailscale down
Result: Mismatch (-1.0)

✅ Developer & Reviewer Checklist

Code follows project style (black, isort, flake8 pass with pre-commit)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
New and existing unit tests pass locally with my changes (23/23)
Docstrings added for all new public classes / functions
If .env vars required, did you add it to the .env.example in repo root?

PLippmann added 3 commits January 7, 2026 14:25

Bash env

6f9070d

Unit tests

ca305ea

README

beb0ac6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Natural language to bash environment #302

Natural language to bash environment #302

Uh oh!

PLippmann commented Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Natural language to bash environment #302

Are you sure you want to change the base?

Natural language to bash environment #302

Uh oh!

Conversation

PLippmann commented Jan 7, 2026

PR Type

📝 General Information

Description

🔖 Environment Snapshot

🧪 Zero-Training Test Results

✅ Developer & Reviewer Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant