Skip to content

vacmar01/bandits

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bandits

A Python library for exploring multi-armed bandit problems. This library provides a platform to experiment with different bandit arms, policies (algorithms), and run complete multi-armed bandit experiments.

What are Multi-Armed Bandits?

The multi-armed bandit problem is a classic problem in probability theory and machine learning that models the trade-off between exploration and exploitation. Imagine a gambler at a casino with multiple slot machines (arms), each with different unknown reward probabilities. The goal is to maximize total reward by learning which arms are best while still exploring potentially better options.

Features

  • Arms: Different types of reward distributions
    • BernoulliArm: Binary rewards with configurable success probability
    • More arm types coming soon...
  • Policies: Various algorithms for arm selection
    • EpsilonGreedy: Balance exploration and exploitation with ε-greedy strategy
    • ThompsonSampling: Bayesian approach using Beta-Bernoulli conjugacy
    • More policies coming soon...
  • Experiment Framework: Bandit class to run complete experiments

Installation

# Using uv (recommended)
uv add bandits

# Or clone and install locally
git clone https://github.com/yourusername/bandits.git
cd bandits
uv sync

Quick Start

Basic Example

from bandits.arms import BernoulliArm
from bandits.policies import EpsilonGreedy
from bandits.core import Bandit

# Create arms with different success probabilities
arms = [
    BernoulliArm(0.1),  # 10% success rate
    BernoulliArm(0.5),  # 50% success rate  
    BernoulliArm(0.8),  # 80% success rate (best arm)
]

# Choose a policy
policy = EpsilonGreedy(n_arms=3, epsilon=0.1)

# Create and run bandit experiment
bandit = Bandit(arms, policy)
bandit.run(n_steps=1000)

print(f"Total reward: {bandit.reward}")
print(f"Policy state: {policy.state}")

Comparing Policies

from bandits.arms import BernoulliArm
from bandits.policies import EpsilonGreedy, ThompsonSampling
from bandits.core import Bandit

# Same arms for fair comparison
arms = [BernoulliArm(0.2), BernoulliArm(0.5), BernoulliArm(0.8)]

# Test Epsilon-Greedy
eg_policy = EpsilonGreedy(n_arms=3, epsilon=0.1)
eg_bandit = Bandit(arms.copy(), eg_policy)
eg_bandit.run(n_steps=1000)

# Test Thompson Sampling  
ts_policy = ThompsonSampling(n_arms=3)
ts_bandit = Bandit(arms.copy(), ts_policy)
ts_bandit.run(n_steps=1000)

print(f"Epsilon-Greedy reward: {eg_bandit.reward}")
print(f"Thompson Sampling reward: {ts_bandit.reward}")

API Reference

Arms

BernoulliArm(p: float)

  • p: Success probability (0.0 to 1.0)
  • pull(): Returns 1 with probability p, 0 otherwise

Policies

EpsilonGreedy(n_arms: int, epsilon: float = 0.2)

  • n_arms: Number of arms in the bandit
  • epsilon: Exploration probability (0.0 to 1.0)
  • choose(): Select an arm index
  • update(arm_idx: int, reward: float): Update policy with observed reward

ThompsonSampling(n_arms: int)

  • n_arms: Number of arms in the bandit
  • choose(): Select an arm using posterior sampling
  • update(arm_idx: int, reward: float): Update Beta posterior

Experiment

Bandit(arms: List[Arm], policy: Policy)

  • arms: List of arm instances
  • policy: Policy instance for arm selection
  • run(n_steps: int = 1000): Run the bandit experiment
  • reward: Total accumulated reward

Development

Running Tests

# Install with dev dependencies
uv sync --group dev

# Run tests
uv run pytest

Project Structure

bandits/
├── bandits/
│   ├── __init__.py
│   ├── arms.py          # Arm implementations
│   ├── policies.py      # Policy/algorithm implementations  
│   └── core.py          # Bandit experiment class
├── test/
│   ├── test_arms.py
│   ├── test_policies.py
│   └── test_core.py
└── README.md

Contributing

Contributions are welcome! This library is designed to be a platform for exploring multi-armed bandit algorithms. Consider adding:

  • New arm types (Gaussian, contextual, etc.)
  • Additional policies (UCB, LinUCB, etc.)
  • Visualization tools
  • Performance metrics and analysis

License

MIT License - see LICENSE file for details.

About

Multi-armed bandit simulations in python

Resources

License

Stars

Watchers

Forks

Languages