A comprehensive research framework for evaluating reinforcement learning agents in market-making environments. This project implements various RL algorithms (PPO, SAC, TD3) and heuristic strategies (Avellaneda-Stoikov, fixed spread, inventory-based) across multiple synthetic market environments.
This project provides:
- 12 Synthetic Market Environments: ABM, GBM, and OU processes with vanilla, jump, regime-switching, and combined variants
- RL Agents: PPO, DeepPPO, LSTM-PPO, SAC, TD3, and LSTM-SAC implementations
- Heuristic Agents: Avellaneda-Stoikov closed-form, fixed spread, inventory-based strategies, and more
- Advanced Training Features: Reward-based early stopping with automatic best weight restoration
- Comprehensive Evaluation: Automated comparison pipelines with metrics and visualization
- Python 3.10+
- PyTorch 1.12+
- Stable-Baselines3 2.0+
- Gymnasium 0.28+
- Clone the repository:
git clone <repository-url>
cd market_making_rl- Create a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies:
pip install -r requirements.txtfrom configs.config_loader import load_config
from envs.abm_vanilla import ABMVanillaEnv
from agents.ppo_agent import PPOAgent
from experiments.runner import run_experiment
# Load configurations
env_cfg = load_config("configs/env_configs.yaml")["abm_vanilla"]
agent_cfg = load_config("configs/agent_configs.yaml")["ppo_basic"]
# Run experiment
run_experiment(
env_class=ABMVanillaEnv,
agent_class=PPOAgent,
env_config=env_cfg,
agent_config=agent_cfg,
train=True,
n_eval_episodes=100,
save_model=True
)Train all RL agents and compare with heuristics:
# Train all RL agents (may take several hours)
python scripts/train_all_rl_agents.py
# Compare all agents across environments
python scripts/compare_all_agents.py
# Aggregate results into comparison tables
python scripts/aggregate_results.py
# Or run the full pipeline at once:
python scripts/run_full_pipeline.py --fullAfter running experiments, you can generate comprehensive reports and visualizations using the following scripts:
Generate a detailed evaluation report with statistics and rankings:
python scripts/create_evaluation_report.pyOutput: results/EVALUATION_REPORT.md
This report includes:
- Overall performance statistics
- Best agent per environment (by PnL and Sharpe ratio)
- Performance by agent category (RL, Analytic, Heuristic)
- Performance by environment type (ABM, GBM, OU)
- Risk analysis (VaR, ES)
- Inventory management statistics
Generate appendix tables in matrix format:
python scripts/create_appendix.pyOutput: results/APPENDIX.md
This creates comprehensive matrices showing:
- All metrics (Mean PnL, Sharpe Ratio, Standard Deviation, VaR, ES, Average Inventory)
- Environments as rows, agents as columns
- Agents grouped by category (RL, Analytic, Heuristic)
- Category averages for each metric
Generate comprehensive visualizations with confidence intervals:
python scripts/create_visualization_report.pyOutput:
results/VISUALIZATION_REPORT.md- Markdown report with embedded figuresresults/VISUALIZATION_REPORT.html- Interactive HTML reportresults/figures/- All visualization figures
This report includes:
- Heatmaps for all metrics across agent-environment combinations
- Category comparison charts with 95% confidence intervals
- Risk-return scatter plots
- Agent rankings with error bars
- PnL distribution plots (violin/box plots)
- Best agent visualizations
- Radar charts for multi-metric comparison
- Environment difficulty analysis
- Agent consistency analysis
Options:
# Customize confidence level and bootstrap iterations
python scripts/create_visualization_report.py --confidence 0.99 --n-bootstrap 2000
# Generate only markdown or HTML
python scripts/create_visualization_report.py --format markdown
python scripts/create_visualization_report.py --format htmlGenerate a research paper-style results and conclusions section:
python scripts/create_results_summary.pyOutput: results/RESULTS_AND_CONCLUSIONS.md
This comprehensive report provides:
- Executive summary with key findings
- Statistical comparisons by agent category
- Individual agent performance analysis
- Environment complexity analysis
- Risk-return trade-off analysis
- Statistical patterns and insights
- Critical discussion (RL vs traditional methods, LSTM effectiveness, etc.)
- Conclusions with research contributions and practical implications
Generate individual risk-return plots for each agent:
python scripts/create_agent_risk_return_plots.pyOutput:
results/AGENT_RISK_RETURN_PROFILES.md- Markdown file with all plotsresults/figures/agent_risk_return_*.png- Individual agent plots
Each plot shows:
- Return standard deviation (risk) on x-axis
- Mean PnL (return) on y-axis
- Points color-coded by environment type (ABM=Blue, GBM=Purple, OU=Orange)
- Environment names labeled on each point
- Legend showing environment types
Generate PnL distribution plots for each agent-environment combination:
python scripts/create_pnl_distributions.pyOutput:
results/PNL_DISTRIBUTIONS.md- Markdown file with all plots organized by environmentresults/figures/pnl_dist_*.png- Individual distribution plots
Each plot shows:
- Histogram with KDE overlay of PnL distribution
- Mean PnL highlighted with red dashed vertical line
- Median highlighted with green dashed vertical line
- Interquartile Range (IQR) highlighted with yellow shaded region
- Statistics box showing mean, median, IQR, standard deviation, and sample size
Generate all reports at once:
# Generate all reports
python scripts/create_evaluation_report.py
python scripts/create_appendix.py
python scripts/create_visualization_report.py
python scripts/create_results_summary.py
python scripts/create_agent_risk_return_plots.py
python scripts/create_pnl_distributions.pyAll reports are saved to the results/ directory and can be directly included in research papers or theses.
market_making_rl/
├── agents/ # Agent implementations (RL + heuristics)
│ ├── ppo_agent.py # PPO agent
│ ├── lstm_agent.py # LSTM-PPO agent
│ ├── sac_agent.py # SAC agent
│ ├── td3_agent.py # TD3 agent
│ ├── as_agent.py # Avellaneda-Stoikov agent
│ └── ... # Other heuristic agents
│
├── envs/ # Market environments
│ ├── base_env.py # Base environment class
│ ├── abm_vanilla.py # Arithmetic Brownian Motion
│ ├── gbm_vanilla.py # Geometric Brownian Motion
│ ├── ou_vanilla.py # Ornstein-Uhlenbeck
│ └── ... # Jump and regime-switching variants
│
├── experiments/ # Experiment utilities
│ ├── runner.py # Main experiment runner
│ ├── callbacks.py # Early stopping callbacks
│ ├── metrics.py # Performance metrics
│ └── plotting.py # Visualization utilities
│
├── configs/ # Configuration files
│ ├── env_configs.yaml # Environment presets
│ ├── agent_configs.yaml # Agent hyperparameters
│ ├── training_configs.yaml # Training settings
│ └── README.md # Detailed config guide
│
├── scripts/ # Automation scripts
│ ├── train_all_rl_agents.py # Train all RL agents
│ ├── compare_all_agents.py # Compare all agents
│ ├── aggregate_results.py # Aggregate results
│ └── run_full_pipeline.py # Full pipeline runner
│
├── models/ # Saved trained models
├── results/ # Experiment results
└── data/ # Data directory
The framework includes advanced early stopping that monitors evaluation performance rather than training loss:
- Monitors: Mean reward, mean PnL, or custom metrics
- Automatic Best Weight Restoration: Saves and restores best model weights automatically
- Flexible Configuration: Choose between reward-based or loss-based monitoring
Example configuration:
early_stopping:
enabled: true
monitor_type: "reward" # or "loss"
monitor: "mean_reward" # or "mean_pnl"
patience: 6 # evaluations without improvement
eval_freq: 10000 # evaluate every N steps
n_eval_episodes: 10 # episodes per evaluationReinforcement Learning Agents:
- PPO (Proximal Policy Optimization)
- DeepPPO (PPO with deep networks)
- LSTM-PPO (Recurrent PPO for temporal dependencies)
- SAC (Soft Actor-Critic)
- TD3 (Twin Delayed DDPG)
- LSTM-SAC (Deep SAC for pattern recognition)
Heuristic Agents:
- Avellaneda-Stoikov closed-form optimal
- Fixed spread strategies
- Inventory-based quote shifting
- Trend-following strategies
- Noise traders (for comparison)
Price Processes:
- ABM (Arithmetic Brownian Motion):
dS = σ dW - GBM (Geometric Brownian Motion):
dS = μS dt + σS dW - OU (Ornstein-Uhlenbeck):
dS = κ(μ - S) dt + σ dW
Variants:
- Vanilla (basic diffusion)
- Jump diffusion (with Poisson jumps)
- Regime-switching (Markov volatility regimes)
- Jump + Regime (combined)
The framework provides scripts for:
- Training all RL agents across environments
- Comparing RL agents with heuristics
- Aggregating results into comparison tables
- Generating visualizations and reports
All experiments are configured via YAML files for reproducibility. See configs/README.md for detailed documentation.
Quick Configuration Guide:
- Environment configs:
configs/env_configs.yaml - Agent configs:
configs/agent_configs.yaml - Training configs:
configs/training_configs.yaml
from configs.config_loader import load_config
from envs.abm_vanilla import ABMVanillaEnv
from agents.ppo_agent import PPOAgent
from agents.as_agent import ASClosedFormAgent
from experiments.runner import run_experiment
env_cfg = load_config("configs/env_configs.yaml")["abm_vanilla"]
# Compare PPO vs AS
for agent_class, agent_key in [(PPOAgent, "ppo_basic"),
(ASClosedFormAgent, "as_closed_form")]:
agent_cfg = load_config("configs/agent_configs.yaml")[agent_key]
run_experiment(
env_class=ABMVanillaEnv,
agent_class=agent_class,
env_config=env_cfg,
agent_config=agent_cfg,
train=(agent_class == PPOAgent),
n_eval_episodes=100
)from configs.config_loader import load_config
from envs import ABMVanillaEnv, OUVanillaEnv, GBMVanillaEnv
from agents.as_agent import ASClosedFormAgent
from experiments.runner import run_experiment
envs = [
(ABMVanillaEnv, "abm_vanilla"),
(OUVanillaEnv, "ou_vanilla"),
(GBMVanillaEnv, "gbm_vanilla"),
]
agent_cfg = load_config("configs/agent_configs.yaml")["as_closed_form"]
for env_class, env_key in envs:
env_cfg = load_config("configs/env_configs.yaml")[env_key]
if env_key == "gbm_vanilla":
agent_cfg = load_config("configs/agent_configs.yaml")["as_closed_form_gbm"]
run_experiment(
env_class=env_class,
agent_class=ASClosedFormAgent,
env_config=env_cfg,
agent_config=agent_cfg,
train=False,
n_eval_episodes=100
)from envs.abm_vanilla import ABMVanillaEnv
from agents.ppo_agent import PPOAgent
from experiments.runner import run_experiment
# Custom environment config
env_cfg = {
"S0": 100.0,
"T": 1.0,
"dt": 0.0001,
"sigma": 2.0,
"A": 5.0,
"k": 1.5,
"base_delta": 1.0,
"max_inventory": 20,
"inv_penalty": 0.01,
"seed": 42
}
# Custom agent config with early stopping
agent_cfg = {
"total_timesteps": 200000,
"learning_rate": 0.0003,
"n_steps": 1024,
"batch_size": 256,
"early_stopping": {
"enabled": True,
"monitor_type": "reward",
"monitor": "mean_reward",
"patience": 10,
"eval_freq": 10000,
"n_eval_episodes": 10
}
}
run_experiment(
env_class=ABMVanillaEnv,
agent_class=PPOAgent,
env_config=env_cfg,
agent_config=agent_cfg,
train=True,
n_eval_episodes=100
)See scripts/README.md for detailed documentation on automation scripts.
Main Scripts:
scripts/train_all_rl_agents.py- Train all RL agentsscripts/compare_all_agents.py- Compare all agentsscripts/aggregate_results.py- Aggregate resultsscripts/run_full_pipeline.py- Run full pipeline
The framework computes comprehensive performance metrics:
- Return Metrics: Mean PnL, Sharpe ratio
- Risk Metrics: Standard deviation, VaR (95%, 99%), Expected Shortfall
- Inventory Metrics: Average inventory level
Results are saved as JSON files and can be aggregated for comparison.
Implement custom training callbacks by extending BaseCallback:
from stable_baselines3.common.callbacks import BaseCallback
class CustomCallback(BaseCallback):
def _on_step(self) -> bool:
# Your custom logic
return True # Continue trainingAdd custom evaluation metrics by modifying experiments/metrics.py.
Create new market environments by extending MarketMakingBaseEnv:
from envs.base_env import MarketMakingBaseEnv
class CustomEnv(MarketMakingBaseEnv):
def _update_price(self):
# Implement your price dynamics
pass- Import Errors: Make sure you're running from the project root directory
- Model Not Found: Ensure training completed successfully before comparison
- Memory Issues: Reduce
total_timestepsor batch sizes in configs - Monitor Warning: Fixed automatically - evaluation envs are wrapped with Monitor
- Check
configs/README.mdfor configuration details - Check
scripts/README.mdfor script usage - Review example code in this README
- Check
ENVIRONMENT_EVALUATION.mdfor environment details
- Training Time: Each RL agent training takes 10-30 minutes (depending on
total_timesteps) - Evaluation Time: Much faster (~seconds per agent)
- Model Size: Saved models are typically 1-10 MB each
- Results Size: Each experiment result is ~100 KB
If you use this code in your research, please cite:
@misc{market_making_rl,
title={Reinforcement Learning vs. Stochastic Control Approaches to Optimal Market Making: A
Controlled Simulation Study},
author={Cem Vural},
year={2026},
url={https://github.com/cemvural00/market_making_rl}
}[All rights reserved] Feel free to contact via e-mail: cemvural2000@icloud.com
- Stable-Baselines3 team for the RL framework
- Avellaneda & Stoikov for the optimal market-making formulation