See the documentation for a technical overview of the platform and train your first agent
# macOS/Linux:
$ curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows:
PS> powershell -c "irm https://astral.sh/uv/install.ps1 | iex"uv init && uv venv --python 3.11
source .venv/bin/activate # Windows: .\.venv\Scripts\Activate.ps1
uv pip install rnowrnow loginrnow init --template sft
rnow runThat's it! Your training run will start on ReinforceNow's infrastructure. Monitor progress in the dashboard.
Go from raw data to a reliable AI agent in production. ReinforceNow gives you the flexibility to define:
Define how your model should be evaluated using the @reward decorator:
from rnow.core import reward, RewardArgs
@reward
async def accuracy(args: RewardArgs, messages: list) -> float:
"""Check if the model's answer matches ground truth."""
response = messages[-1]["content"]
expected = args.metadata["answer"]
return 1.0 if expected in response else 0.0→ Write your first reward function
Give your model the ability to call functions during training:
from rnow.core import tool
@tool
def search(query: str, max_results: int = 5) -> dict:
"""Search the web for information."""
# Your implementation here
return {"results": [...]}→ Train an agent with custom tools
Create a train.jsonl file with your prompts and reward assignments:
{"messages": [{"role": "user", "content": "Balance the equation: Fe + O2 → Fe2O3"}], "rewards": ["accuracy"], "metadata": {"answer": "4Fe + 3O2 → 2Fe2O3"}}
{"messages": [{"role": "user", "content": "Balance the equation: H2 + O2 → H2O"}], "rewards": ["accuracy"], "metadata": {"answer": "2H2 + O2 → 2H2O"}}
{"messages": [{"role": "user", "content": "Balance the equation: N2 + H2 → NH3"}], "rewards": ["accuracy"], "metadata": {"answer": "N2 + 3H2 → 2NH3"}}→ Learn about training data format
We welcome contributions! ❤️ Please open an issue to discuss your ideas before submitting a PR


