Skip to content

sharckhai/trader

Repository files navigation

RL Trader

A reinforcement learning framework for cryptocurrency trading, supporting BTC spot and perpetual swap markets. The project implements three major RL algorithms (DQN, DRL, PPO) with a modular architecture built on Gymnasium.

Features

  • Multiple RL Algorithms: DQN (discrete actions), DRL (continuous policy gradient), PPO (proximal policy optimization)
  • Gymnasium-Compatible Environment: Standard RL interface with customizable observation and action spaces
  • Perpetual Swap Support: Long/short positions with funding rate handling
  • Flexible Reward Functions: 7 reward calculation methods including Sharpe ratio, Sortino ratio, and Differential Sharpe Ratio (DSR)
  • Modular Architecture: Clean separation between environment, agent, and training components
  • Configuration-Driven: YAML configs with CLI override support

Installation

Using uv (recommended)

uv venv
uv pip install -r requirements.txt

# Development dependencies
uv pip install -r requirements-dev.txt

Using pip

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install -r requirements-dev.txt

Quick Start

Training an Agent

# Train with DQN (discrete actions)
python -m cli.train_agent --trainer DQNTrainer --episodes 100

# Train with DRL (continuous actions)
python -m cli.train_agent --trainer DRLTrainer --episodes 50 --batch-size 32

# Train with PPO (modern policy gradient)
python -m cli.train_agent --trainer PPOTrainer --episodes 100 --num-steps 256

Using Configuration Files

# Train from config
python -m cli.train_agent --config configs/training_config.yaml

# Override config with CLI args
python -m cli.train_agent --config configs/training_config.yaml \
    --trainer DRLTrainer --learning-rate 1e-3

Architecture

src/
├── cli/              # Command-line training entry point
├── configs/          # YAML configuration files
├── data/             # Data loading and preprocessing
│   ├── loader.py           # CSV data loading
│   ├── preprocessing.py    # Feature engineering
│   └── downloaders/        # Binance API integration
├── envs/             # Trading environment
│   ├── trading_env.py      # BTCMarketEnv (Gymnasium)
│   └── rewards.py          # Reward functions
├── ml/               # Neural network models
│   ├── models.py           # TensorFlow MLP/LSTM
│   └── models_torch.py     # PyTorch ActorCritic
├── rl/               # Reinforcement learning
│   ├── agent.py            # TensorFlow TraderAgent
│   ├── agent_torch.py      # PyTorch PPOAgent
│   ├── trainer.py          # Base trainer class
│   ├── algorithms/         # DQN, DRL, PPO implementations
│   ├── buffers/            # Experience/rollout buffers
│   └── exploration/        # Epsilon-greedy strategy
├── trading/          # Trade execution
└── utils/            # Logging, visualization, config loading

Algorithms

DQN (Deep Q-Network)

Discrete action space with 4 actions: Hold, Buy 50%, Buy 100%, Sell.

  • Experience replay buffer
  • Epsilon-greedy exploration with decay
  • Batch training with MSE loss

DRL (Deep Reinforcement Learning)

Continuous action space for position allocation [-1, 1] or [0, 1].

  • Policy gradient with Differential Sharpe Ratio
  • Adaptive risk aversion coefficient
  • Step-by-step buffer updates

PPO (Proximal Policy Optimization)

Modern Actor-Critic architecture with PyTorch.

  • Clipped surrogate objective
  • Generalized Advantage Estimation (GAE)
  • Entropy bonus for exploration
  • Gaussian policy for continuous actions

Environment

The BTCMarketEnv is a Gymnasium-compatible trading environment.

Observation Space

State tensor shape: (features, window_size) e.g., (8, 20)

Feature Description Normalization
Close change Price movement Sigmoid
MACD Histogram Momentum indicator Sigmoid
EMA 50 change Trend indicator Sigmoid
Wallet change Portfolio performance Sigmoid
Volume Trading volume MinMax
Open price Opening price MinMax
RSI 14 Relative strength MinMax
MACD Moving average convergence MinMax

Action Space

  • DQN: Discrete(4) - Hold, Buy 50%, Buy 100%, Sell
  • DRL/PPO: Box(-1, 1) - Position allocation from full short to full long

Reward Functions

  • reward_differential_sharpe_ratio: EMA-tracked Sharpe with risk aversion (default for DRL)
  • reward_sharpe_ratio: Risk-adjusted returns
  • reward_sortino_ratio: Downside-adjusted returns
  • reward_profit: Simple profit-based
  • reward_sterling_ratio: Max drawdown adjusted
  • compute_reward_from_tutor: Basic yield reward
  • reward_price_rate_log: Logarithmic price changes

Configuration

Configuration is managed via YAML files in configs/:

environment:
  observation_space: [8, 20]
  start_money: 10000
  trading_fee: 0.001

agent:
  action_domain: [0.0, 1.0]
  epsilon: 0.5
  epsilon_decay: 0.75

training:
  trainer: "DQNTrainer"
  episodes: 50
  batch_size: 16
  learning_rate: 1.0e-7

reward:
  function: "reward_differential_sharpe_ratio"

CLI Parameters

Parameter Description Default
--trainer Algorithm: DQNTrainer, DRLTrainer, PPOTrainer DQNTrainer
--episodes Number of training episodes 50
--batch-size Batch size for training 16
--learning-rate Learning rate 1e-7
--num-steps Steps per rollout (PPO) 128
--clip-coef PPO clipping coefficient 0.2
--gae-lambda GAE lambda parameter 0.95
--gpu-memory GPU memory limit (MB) None

Data

Place CSV market data files in the datasets/ directory. Expected columns:

date, open, high, low, close, Volume, histogram, 50ema, rsi14, macd

For perpetual swaps, additional columns:

Close_BTC, Funding_Rate

Evaluation Data

Training data for evaluation: Download Link

Extract all training folders to a single directory and set the base_folder variable in the evaluation notebook.

Testing

# Run all tests
python -m pytest

# Run specific test file
python -m pytest tests/unit/envs/test_trader_env.py -v

# Run specific test
python -m pytest tests/unit/envs/test_trader_env.py::TestEnv::test_handle_long_position -v

Code Quality

# Format code
black src/ tests/

# Lint
ruff check src/ tests/

# Type checking
mypy src/

Output Structure

Training outputs are saved to logs/:

logs/
└── {algorithm}_trial_0/
    ├── model_checkpoint.h5    # TensorFlow model
    ├── model_checkpoint.pt    # PyTorch model (PPO)
    ├── params.json            # Training parameters
    ├── training_log.csv       # Episode metrics
    └── episode_{N}.csv        # Per-episode logs

Documentation

Detailed documentation is available in the docs/ directory:

  • TRAINING_ARCHITECTURE.md: Comprehensive training system specification
  • BTCMarket_Env_spec.md: Environment specification
  • PPO_TRAINING.md: PPO algorithm details

Requirements

  • Python 3.10+
  • PyTorch 2.0+ (for PPO)
  • TensorFlow 2.10+ (for DQN/DRL)
  • Gymnasium 0.26+
  • NumPy, Pandas, scikit-learn

Best Trainings (Historical)

DRL Trainings

Reward Function Training ID
reward_sharpe_ratio 20230429_200721
reward_differential_sharpe_ratio 20230424_070731
compute_reward_from_tutor 20230429_200559
reward_profit (v0) 20230429_110440
reward_profit (v1) 20230423_174023
reward_profit (v2) 20230420_083508
reward_profit (v3) 20230420_195053

DQN Trainings

Reward Function Training ID
reward_sharpe_ratio 20230427_165557
reward_differential_sharpe_ratio (v0) 20230427_083418
reward_differential_sharpe_ratio (v1) 20230423_114422
compute_reward_from_tutor (v0) 20230427_083632
compute_reward_from_tutor (v1) 20230421_181519
reward_profit (v0) 20230425_145617
reward_profit (v1) 20230423_231505
reward_profit (v2) 20230422_122001
reward_profit (v3) 20230421_001115

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors