Bourbon

Bourbon is a Python package for Reinforcement Learning (RL), focusing on RL-based training of Large Language Models (LLMs). It's an experimentation project built on top of PyTorch and the following research papers:

Reflexion: Language Agents with Verbal Reinforcement Learning

ReAct: Synergizing Reasoning and Acting in Language Models

The focus is to use natural language feedback as a reward signal to train LLMs to 1. solve a task via reasoning and acting, and 2. to improve the performance of LLMs on a given task via verbal self-reflection and to align the model's behavior with human preferences.

Quick Start

Installation

pip install bourbon

Prerequisites

Before using Bourbon to solve your problem via RL, you need to define:

State space: How your problem states are represented
Actions: What operations your agent can perform
Reward function: How you assign rewards for actions

📖 Core Concepts

Environment

The first step is mapping your problem to an RL environment. Environments can be:

Deterministic: Same action in same state always produces same result
Stochastic: Actions may have probabilistic outcomes

State Representation

States can be represented as vectors of natural numbers {1, 2, 3, ...}. Here's a classic grid world example:

In this 3x3 grid:

9 total states (indexed 1-9)
Agent (orange) navigates to reach the goal (green)
Goal state provides reward of +10

Actions

Actions define what operations your RL agent can perform. In the grid example above, the agent has 4 possible actions:

LEFT: Move one cell left
RIGHT: Move one cell right
UP: Move one cell up
DOWN: Move one cell down

Rewards

RL agents learn by maximizing future rewards. Bourbon supports:

Immediate rewards: Agent receives feedback after each action
Delayed rewards: Agent receives feedback only at episode end or after action sequences

Design your reward function to guide the agent toward desired behaviors.

🎯 Examples

Featured Notebooks

Explore complete worked examples in the notebooks/ directory:

Notebook	Description
`multiplication.ipynb`	Train an agent to learn multiplication tables
`capitals.ipynb`	Train an agent to predict country capitals
`wind.ipynb`	Solve the classic windy gridworld problem

Quick Example

import bourbon

# Define your environment, actions, and rewards
# Train your agent
agent = bourbon.QLearning(state_space_size=9, action_space_size=4)

# Your training loop here
for episode in range(1000):
    # ... training logic
    pass

🛠️ Development

Requirements

Python 3.10+
PyTorch 2.0.0
Additional dependencies listed in pyproject.toml

Project Structure

bourbon/
├── bourbon/           # Main package
│   ├── q_learning.py  # Q-learning implementation
│   ├── qtable.py      # Q-table utilities
│   └── steps.py       # Step management
├── notebooks/         # Example notebooks
├── docs/             # Documentation
└── resources/        # Data files

📚 Documentation

Research Papers: docs/articles/
Figures: docs/figs/
Examples: notebooks/

🤝 Contributing

Contributions are welcome! Please feel free to submit issues and pull requests.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
bourbon		bourbon
docs		docs
notebooks		notebooks
resources		resources
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGES.rst		CHANGES.rst
LICENSE		LICENSE
README.md		README.md
README.rst		README.rst
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bourbon

Quick Start

Installation

Prerequisites

📖 Core Concepts

Environment

State Representation

Actions

Rewards

🎯 Examples

Featured Notebooks

Quick Example

🛠️ Development

Requirements

Project Structure

📚 Documentation

🤝 Contributing

📄 License

🔗 Links

About

Uh oh!

Releases

Packages

Languages

License

meghdadFar/bourbon

Folders and files

Latest commit

History

Repository files navigation

Bourbon

Quick Start

Installation

Prerequisites

📖 Core Concepts

Environment

State Representation

Actions

Rewards

🎯 Examples

Featured Notebooks

Quick Example

🛠️ Development

Requirements

Project Structure

📚 Documentation

🤝 Contributing

📄 License

🔗 Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages