A comprehensive, modular reinforcement learning trading system built with stable-baselines3 and FinRL. This project provides a complete pipeline for training, evaluating, and deploying RL agents for algorithmic trading.
- Modular Architecture: Clean separation of concerns with dedicated modules for environments, agents, training, inference, and utilities
- Multiple RL Algorithms: Support for PPO and other stable-baselines3 algorithms
- Flexible Reward Functions: Extensible reward system with profit-based, Sharpe ratio, and risk-adjusted rewards
- Comprehensive Evaluation: Built-in backtesting, performance metrics, and comparison tools
- Web Interface: Streamlit-based UI for easy interaction and visualization
- RESTful API: FastAPI-based service for programmatic access
- Feature Engineering: Advanced technical indicators and market regime detection
- Multiple Environments: Single-stock, multi-asset, and continuous trading environments
RLTradingAgent/
βββ src/ # Main source code
β βββ agents/ # RL agent implementations
β βββ data/ # Data loading and feature engineering
β βββ envs/ # Trading environments
β βββ inference/ # Model inference and API
β βββ rewards/ # Reward function implementations
β βββ train/ # Training and evaluation
β βββ ui/ # Streamlit web interface
β βββ utils/ # Utilities and metrics
β βββ main.py # CLI entry point
βββ scripts/ # Converted notebooks and utilities
βββ notebooks/ # Original Jupyter notebooks
βββ models/ # Trained model storage
βββ data/ # Data storage
βββ logs/ # Training and application logs
βββ results/ # Evaluation results
βββ requirements.txt # Python dependencies
- Clone the repository
git clone <repository-url>
cd RLTradingAgent- Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate- Install dependencies
pip install -r requirements.txt- Setup project structure
python scripts/setup.py# Train a PPO agent for Apple stock
python src/main.py train --ticker AAPL --start-date 2020-01-01 --end-date 2023-01-01
# Train with custom parameters
python src/main.py train --ticker AAPL --total-timesteps 100000 --learning-rate 0.0003# Evaluate a trained model
python src/main.py evaluate --ticker AAPL --start-date 2023-01-01 --end-date 2024-01-01
# Compare multiple models
python src/main.py compare --tickers AAPL GOOGL MSFT --start-date 2023-01-01streamlit run src/ui/app.pypython src/inference/api.pyActions represent trading decisions in continuous space [-1, 1]:
- -1: Sell maximum allowed
- 0: Hold position
- 1: Buy maximum allowed
Actions are scaled by hmax (maximum holdings):
hmax = initial_amount / max_price
The state includes:
- Portfolio value (1 dimension)
- Stock holdings and prices (2 Γ stock_dim)
- Technical indicators (len(indicators) Γ stock_dim)
State Space Size: 1 + 2 Γ stock_dim + len(indicators) Γ stock_dim
- MACD: Moving Average Convergence Divergence
- RSI: Relative Strength Index
- Bollinger Bands: Price volatility bands
- Volume: Trading volume indicators
- SMA/EMA: Simple and Exponential Moving Averages
- ATR: Average True Range
- Stochastic Oscillator: Momentum indicator
- Profit Reward: Direct portfolio value changes
- Sharpe Ratio Reward: Risk-adjusted returns
- Drawdown Penalty: Penalizes large losses
- Transaction Cost Aware: Considers trading costs
- Risk Adjusted: Balances returns with risk metrics
from src.train import TradingTrainer, TrainingConfig
from src.rewards import SharpeRatioReward
config = TrainingConfig(
ticker="AAPL",
start_date="2020-01-01",
end_date="2023-01-01",
total_timesteps=100000,
learning_rate=0.0003,
reward_function=SharpeRatioReward()
)
trainer = TradingTrainer(config)
model = trainer.train()from src.inference import TradingInferenceEngine
from stable_baselines3 import PPO
# Load model and create inference engine
model = PPO.load("models/AAPL.zip")
engine = TradingInferenceEngine()
# Get prediction
action = engine.predict_action(model, env, portfolio_value=10000, num_shares=10)from src.rewards import BaseRewardFunction
class CustomReward(BaseRewardFunction):
def calculate_reward(self, data: dict) -> float:
portfolio_value = data['portfolio_value']
benchmark_return = data.get('benchmark_return', 0)
# Custom reward logic
return portfolio_value * 0.1 - abs(benchmark_return) * 0.05python src/inference/api.pycurl -X POST "http://localhost:8000/predict" \
-H "Content-Type: application/json" \
-d '{
"ticker": "AAPL",
"portfolio_value": 10000,
"num_shares": 10
}'curl -X POST "http://localhost:8000/predict/batch" \
-H "Content-Type: application/json" \
-d '{
"ticker": "AAPL",
"start_date": "2023-01-01",
"end_date": "2023-12-31",
"initial_amount": 10000
}'# Single model backtest
python src/main.py backtest --ticker AAPL --model-path models/AAPL.zip
# Cross-validation
python src/main.py cv --ticker AAPL --folds 5# Comprehensive evaluation report
python src/main.py report --ticker AAPL --output-dir results/The system tracks comprehensive performance metrics:
- Returns: Total, annualized, and risk-adjusted returns
- Risk Metrics: Sharpe ratio, Sortino ratio, maximum drawdown
- Trading Metrics: Win rate, profit factor, trading frequency
- Benchmark Comparison: Alpha, beta, tracking error
- Portfolio Metrics: Volatility, VaR, Calmar ratio
- Transaction Costs: 0.1% per transaction (configurable)
- Position Limits: Maximum position sizing controls
- Drawdown Limits: Automatic position reduction on large losses
- Volatility Filters: Reduced trading during high volatility periods
export MODELS_DIR="./models"
export DATA_DIR="./data"
export LOG_LEVEL="INFO"# config.yaml
training:
total_timesteps: 100000
learning_rate: 0.0003
batch_size: 64
n_steps: 2048
environment:
initial_amount: 10000
transaction_cost_pct: 0.001
turbulence_threshold: 140
features:
technical_indicators: true
time_features: true
market_regime: true- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- FinRL for the trading environment foundation
- Stable-Baselines3 for RL algorithms
- yfinance for market data
- ta for technical indicators
For questions and support:
- Create an issue in the GitHub repository
- Check the documentation
- Review existing discussions
Disclaimer: This software is for educational and research purposes only. Past performance does not guarantee future results. Always conduct your own research before making investment decisions.
A trend-following momentum indicator that shows the relationship between two moving averages. It helps identify potential buy or sell signals based on crossovers and divergence.
β’ boll_ub (Bollinger Upper Band)
The upper boundary of Bollinger Bands, which represents a price level where an asset may be overbought. It is calculated using a moving average and standard deviation.
Bollinger Upper Band = SMA + k Γ Ο
β’ boll_lb (Bollinger Lower Band)
The lower boundary of Bollinger Bands, indicating a price level where an asset may be oversold.It helps traders identify potential buying opportunities.
β’ rsi_30 (Relative Strength Index - 30 period)
A momentum oscillator that measures the speed and change of price movements. It ranges from 0 to 100, with values below 30 indicating oversold conditions and potential reversals.
β’ cci_30 (Commodity Channel Index - 30 period)
A momentum-based indicator that identifies price trends and overbought/oversold conditions.Positive values suggest bullish momentum, while negative values indicate bearish trends.
Measures trend strength by comparing upward and downward movement. Higher values indicate a stronger trend, while lower values suggest weak or no trend.
β’ close_30_sma (30-period Simple Moving Average of Close Price)
The average closing price over 30 periods. It smooths price fluctuations to help identify trends and potential support/resistance levels.
β’ close_60_sma (60-period Simple Moving Average of Close Price)
Similar to the 30-period SMA but over a longer period, providing a broader view of price trends and reducing short-term noise.
A measure of market volatility and instability. Higher turbulence values indicate unpredictable price movements, which can signal potential risk or upcoming market shifts.
The Sharpe Ratio measures the risk-adjusted return of an investment by comparing its excess return over the risk-free rate to the standard deviation of its returns. A higher Sharpe Ratio indicates better risk-adjusted performance.
Sharp$e R a t i o = \frac { A v e r a g e R e t u r n - R i s k - F r e e e R t e } { s \tan d a r d D e v i a t i o n o f R e t u r n }$
The Sortino Ratio is similar to the Sharpe Ratio but only considers downside risk (volatility of negative returns) rather than total volatility. It is a more refined measure of risk-adjusted return as it focuses on the harmful part of risk.
Simple reward function that measures the total profit/loss or return obtained on day. Leads to issues such as early convergence of model.
Reward = Current Portfolio Value β Previous Portfolio Value
| day: 2515, episode: 0 |
|---|
| begin_total_asset: 10000.00 |
| end_total_asset: 67467.96 |
| total_reward: 57467.96 |
| total_cost: 907.57 |
| total_trades: 2515 |
| Sharpe: 0.811 |
Smooths the return over a period to reduce volatility.
$R e w a r d = \frac { 1 } { N } \sum _ { i = 1 } ^ { N } F$Return
Balances return with risk using hyperparameters Ξ± and Ξ²
Reward = Ξ±Γreturn_moving_average β Ξ²Γdownside_return
where Ξ±,Ξ² are hyperparameters that control the weight of return and risk.
https://www.researchgate.net/publication/356127405_Portfolio_Performance_and Risk_Penalty_Measurement_with_Differential_Return
Algorithm Used:
Proximal Policy Optimization (PPO)
