PredictionMarketBench

A benchmark for evaluating prediction market trading agents using real Kalshi market replay data.

Overview

PredictionMarketBench replays real Kalshi episodes and evaluates trading agent performance under realistic execution constraints. An "episode" is one Kalshi event with multiple tickers/markets.

Input: Recorded orderbook snapshots and trade prints from Kalshi
Agent: Implements a trading policy using provided market data tools
Output: Equity curve, PnL, Sharpe ratio, drawdown, and other metrics

Installation

git clone https://github.com/Oddpool/PredictionMarketBench.git
cd PredictionMarketBench

# Install in development mode
pip install -e .

# With plotting support (for charts and GIFs)
pip install -e ".[plotting]"

# With parquet support (faster data loading)
pip install -e ".[parquet]"

# All optional dependencies
pip install -e ".[all]"

Quick Start

from oddpool_bench import Agent, AgentContext, BenchmarkHarness

class MyAgent(Agent):
    def act(self, ctx: AgentContext) -> None:
        markets = ctx.get_markets()
        positions = ctx.get_positions()
        cash = ctx.get_cash()
        
        # Your trading logic here
        # Use ctx.place_order(order) to trade

# Run benchmark
harness = BenchmarkHarness(episodes_dir="./episodes")
result = harness.run(MyAgent())
result.print_summary()

# Save results
result.save("results/summary.json")
result.save_trades("results/trades.json")
result.save_equity_csv("results/equity.csv")
result.save_equity_curve("results/equity.png")
result.save_equity_gif("results/")  # Animated PnL GIFs

Agent Interface

Agents implement the Agent base class:

from oddpool_bench import Agent, AgentContext

class MyAgent(Agent):
    def act(self, ctx: AgentContext) -> None:
        """Called periodically to make trading decisions."""
        pass
    
    def on_episode_start(self, metadata: dict) -> None:
        """Called at the start of each episode."""
        pass
    
    def on_episode_end(self, result: dict) -> None:
        """Called at the end of each episode."""
        pass

Context Methods

The AgentContext provides these methods:

Method	Description
`get_markets()`	List of all markets with best bid/ask
`get_orderbook(ticker, depth)`	Full orderbook for a ticker
`get_positions()`	Current positions per ticker
`get_cash()`	Cash and equity balances
`place_order(order)`	Submit an order
`get_resting_orders()`	View resting limit orders
`cancel_order(order_id)`	Cancel a resting order

Order Types

from oddpool_bench import Order, Side, Action, OrderType, TimeInForce

# Market order (immediate fill or reject)
Order(ticker="TICKER", side=Side.YES, action=Action.BUY,
      order_type=OrderType.MARKET, count=10)

# Limit IOC (fill immediately or cancel)
Order(ticker="TICKER", side=Side.YES, action=Action.BUY,
      order_type=OrderType.LIMIT, count=10, limit_price_cents=50,
      time_in_force=TimeInForce.IOC)

# Limit GTC (fill what crosses, rest the remainder)
Order(ticker="TICKER", side=Side.YES, action=Action.BUY,
      order_type=OrderType.LIMIT, count=10, limit_price_cents=48,
      time_in_force=TimeInForce.GTC)

# Post-only (reject if would cross, always rest as maker)
Order(ticker="TICKER", side=Side.YES, action=Action.BUY,
      order_type=OrderType.LIMIT, count=10, limit_price_cents=45,
      time_in_force=TimeInForce.POST_ONLY)

Included Episodes

Episode	Event Type	Tickers	Description
`KXBTCD-26JAN2017`	Bitcoin Price	23	BTC price thresholds at Jan 20, 2026 5PM EST
`KXHIGHNY-26JAN20`	Weather	6	NYC high temperature on Jan 20, 2026
`KXNFLGAME-26JAN11BUFJAC`	Sports	2	NFL Playoffs: Buffalo vs Jacksonville
`KXNCAAF-26`	Sports	2	College Football: Indiana vs Miami

Execution Modes

Taker-Only

Orders execute immediately against the displayed orderbook. Orders that don't cross are rejected.

Maker-Taker

Orders can rest in the book and receive fills when historical trades match your price. Queue position is simulated based on displayed size and pro-rata fill allocation.

Fee Model

Fees follow the Kalshi fee schedule (October 2025):

Taker: 7% of contracts * P * (1-P) (rounded up)
Maker: 1.75% of contracts * P * (1-P) (rounded up)

Where P is the price as a decimal. Fees are highest at 50 cents and decrease toward extreme prices.

Examples (1 contract):

At 50c taker: ceil(0.07 * 1 * 0.50 * 0.50 * 100) = 2 cents
At 50c maker: ceil(0.0175 * 1 * 0.50 * 0.50 * 100) = 1 cent
At 10c taker: ceil(0.07 * 1 * 0.10 * 0.90 * 100) = 1 cent

Metrics

Per-episode metrics:

Total PnL (cents and percentage)
Max drawdown
Sharpe ratio
Contracts traded
Fees paid
Maker vs taker fills

Output Methods

result.save("summary.json")            # Aggregate and per-episode metrics
result.save_trades("trades.json")      # Detailed trade log
result.save_equity_csv("equity.csv")   # Time-series equity data
result.save_equity_curve("equity.png") # Static PnL chart
result.save_equity_gif("output_dir/")  # Animated PnL GIF per episode

Animated GIF Output

Generate animated PnL visualizations:

result.save_equity_gif(
    path="./gifs/",           # Output directory
    duration_seconds=6.0,     # Animation length
    fps=15,                   # Frames per second
)

Creates one GIF per episode showing the equity curve over time.

Configuration

from oddpool_bench import SimulatorConfig, BenchmarkHarness

config = SimulatorConfig(
    agent_call_cadence_seconds=5.0,      # How often to call agent
    equity_sample_interval_seconds=60.0, # Equity curve sampling
    max_tool_calls_per_step=100,         # Tool call budget per step
    verbose=True,
)

harness = BenchmarkHarness(episodes_dir="./episodes", config=config)

Example Agents

See examples/example_agents.py for reference implementations:

PassiveAgent: Never trades (baseline)
RandomAgent: Random trades (demonstrates interface)
MomentumAgent: Follows recent price direction

Episode Data Format

Episodes are stored in episodes/{episode_id}/:

metadata.json      # Episode configuration
orderbook.parquet  # Market data (or orderbook.csv.gz)
trades.parquet     # Trade prints (or trades.csv.gz)
settlement.json    # Final outcomes

License

MIT

Links

Repository: https://github.com/Oddpool/PredictionMarketBench
Oddpool: https://oddpool.com

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
episodes		episodes
examples		examples
src/oddpool_bench		src/oddpool_bench
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PredictionMarketBench

Overview

Installation

Quick Start

Agent Interface

Context Methods

Order Types

Included Episodes

Execution Modes

Taker-Only

Maker-Taker

Fee Model

Metrics

Output Methods

Animated GIF Output

Configuration

Example Agents

Episode Data Format

License

Links

About

Uh oh!

Releases

Packages

Languages

Oddpool/PredictionMarketBench

Folders and files

Latest commit

History

Repository files navigation

PredictionMarketBench

Overview

Installation

Quick Start

Agent Interface

Context Methods

Order Types

Included Episodes

Execution Modes

Taker-Only

Maker-Taker

Fee Model

Metrics

Output Methods

Animated GIF Output

Configuration

Example Agents

Episode Data Format

License

Links

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages