A Python library for exploring multi-armed bandit problems. This library provides a platform to experiment with different bandit arms, policies (algorithms), and run complete multi-armed bandit experiments.
The multi-armed bandit problem is a classic problem in probability theory and machine learning that models the trade-off between exploration and exploitation. Imagine a gambler at a casino with multiple slot machines (arms), each with different unknown reward probabilities. The goal is to maximize total reward by learning which arms are best while still exploring potentially better options.
- Arms: Different types of reward distributions
BernoulliArm: Binary rewards with configurable success probability- More arm types coming soon...
- Policies: Various algorithms for arm selection
EpsilonGreedy: Balance exploration and exploitation with ε-greedy strategyThompsonSampling: Bayesian approach using Beta-Bernoulli conjugacy- More policies coming soon...
- Experiment Framework:
Banditclass to run complete experiments
# Using uv (recommended)
uv add bandits
# Or clone and install locally
git clone https://github.com/yourusername/bandits.git
cd bandits
uv syncfrom bandits.arms import BernoulliArm
from bandits.policies import EpsilonGreedy
from bandits.core import Bandit
# Create arms with different success probabilities
arms = [
BernoulliArm(0.1), # 10% success rate
BernoulliArm(0.5), # 50% success rate
BernoulliArm(0.8), # 80% success rate (best arm)
]
# Choose a policy
policy = EpsilonGreedy(n_arms=3, epsilon=0.1)
# Create and run bandit experiment
bandit = Bandit(arms, policy)
bandit.run(n_steps=1000)
print(f"Total reward: {bandit.reward}")
print(f"Policy state: {policy.state}")from bandits.arms import BernoulliArm
from bandits.policies import EpsilonGreedy, ThompsonSampling
from bandits.core import Bandit
# Same arms for fair comparison
arms = [BernoulliArm(0.2), BernoulliArm(0.5), BernoulliArm(0.8)]
# Test Epsilon-Greedy
eg_policy = EpsilonGreedy(n_arms=3, epsilon=0.1)
eg_bandit = Bandit(arms.copy(), eg_policy)
eg_bandit.run(n_steps=1000)
# Test Thompson Sampling
ts_policy = ThompsonSampling(n_arms=3)
ts_bandit = Bandit(arms.copy(), ts_policy)
ts_bandit.run(n_steps=1000)
print(f"Epsilon-Greedy reward: {eg_bandit.reward}")
print(f"Thompson Sampling reward: {ts_bandit.reward}")- p: Success probability (0.0 to 1.0)
- pull(): Returns 1 with probability p, 0 otherwise
- n_arms: Number of arms in the bandit
- epsilon: Exploration probability (0.0 to 1.0)
- choose(): Select an arm index
- update(arm_idx: int, reward: float): Update policy with observed reward
- n_arms: Number of arms in the bandit
- choose(): Select an arm using posterior sampling
- update(arm_idx: int, reward: float): Update Beta posterior
- arms: List of arm instances
- policy: Policy instance for arm selection
- run(n_steps: int = 1000): Run the bandit experiment
- reward: Total accumulated reward
# Install with dev dependencies
uv sync --group dev
# Run tests
uv run pytestbandits/
├── bandits/
│ ├── __init__.py
│ ├── arms.py # Arm implementations
│ ├── policies.py # Policy/algorithm implementations
│ └── core.py # Bandit experiment class
├── test/
│ ├── test_arms.py
│ ├── test_policies.py
│ └── test_core.py
└── README.md
Contributions are welcome! This library is designed to be a platform for exploring multi-armed bandit algorithms. Consider adding:
- New arm types (Gaussian, contextual, etc.)
- Additional policies (UCB, LinUCB, etc.)
- Visualization tools
- Performance metrics and analysis
MIT License - see LICENSE file for details.