Consistent training and inference stack for building tool-using chat agents on OpenRLHF and vLLM.
OpenRLHF-Agent is a slim runtime for tool-using chat agents. It keeps environment orchestration, chat protocols, and model I/O identical across RL training and production inference, so the code you prototype with OpenRLHF can ship unchanged behind a vLLM/OpenAI-compatible endpoint.
- Training = inference: the same
AgentSessionflow drives resets, tool calls, and transcript rendering in both phases. - Small surface area: minimal primitives (
AgentRuntime,Environment,ChatProtocol,LLMEngine, shared types) that are easy to audit. - Tool-first: built-ins like
commentaryshow ReAct-style loops; finals stay plain-text assistant replies. - Proven demos: Qwen-3 samples cover inference serving, RL data collection, and REINFORCE++ training.
- OpenRLHF-ready: drop
AgentRuntimeintotrain_reinforce_agent.shor Ray jobs without extra glue.
Teams need agents that plan actions, call tools, and stay consistent between experiments and production. Use this stack to:
- Iterate on multi-step reasoning with reward shaping and safety hooks.
- Deploy the same prompt + tool logic to live inference endpoints.
- Extend agents with memory, search, or enterprise tools while keeping one runtime abstraction.
AgentRuntime
├─ AgentSession (shared rollouts for training + inference)
├─ ChatProtocol (prompt rendering, tool parsing)
├─ Environment (state, rewards, injected tools)
└─ LLMEngine (token streaming via OpenAI/vLLM/custom)
See docs/ARCHITECTURE.md for a deeper dive into how these modules interact.
git clone https://github.com/OpenRLHF/OpenRLHF-Agent.git
cd OpenRLHF-Agent
pip install -e .
# optional extras
pip install -e .[dev] # linting & tests
pip install -e .[openrlhf] # pulls OpenRLHF core packagesRuntime dependencies are also listed in requirements.txt if you prefer pip install -r requirements.txt.
Optionally launch vLLM for local serving (pip install vllm), then start a Qwen-3 endpoint such as:
vllm serve Qwen/Qwen3-4B-Instruct-2507 \
--port 8009 \
--served-model-name qwen \
--tensor-parallel-size 4 \
--gpu-memory-utilization 0.8Then run the reference inference demo:
python examples/qwen3/runtime_demo.pyThe script wires together:
OpenAIEnginepointing at a vLLM/OpenAI-compatible endpoint.FunctionCallEnvironmentwith thecommentarytool, feedback hooks, and plain-text finals.Qwen3ThinkingProtocol()for prompt rendering and<tool_call>parsing.
You will see tool traces and the final answer printed to the console.
examples/qwen3/agent_func.py exposes the AgentInstance / AgentExecutor hooks required by OpenRLHF. Run examples/qwen3/train_reinforce_agent.sh (set DATASET_PATH) or integrate the functions into your own Ray jobs to collect trajectories and train policies.
Start from the built-in abstractions—tools, environments, protocols, rewards—and extend them as needed:
- Subclass
ToolBasefromsrc/openrlhf_agent/agentkit/tools/base.py. - Implement
async def call(self, context, **kwargs)to return visible output or structured JSON. - Pass the tool into your environment (
FunctionCallEnvironment(tools=[...])) or register it dynamically viaenv.register_tool(...).
- Compose a
RewardPipelinewith result/process strategies fromsrc/openrlhf_agent/agentkit/rewards/(e.g.,MatchingReward,MathMatchingRewardfor symbolic math equivalence). - Pass the pipeline into
AgentSession(..., reward_pipeline=...)so eachstep_from_textcall can emit scalar rewards during RL training.
- Subclass
ChatProtocolinsrc/openrlhf_agent/agentkit/protocols/base.py. - Implement render + parse helpers for your provider format.
- Instantiate it and wire it into
AgentRuntime(orAgentSession) directly.
Apache License 2.0.