Skip to content

This repository includes ensemble methods and a deep Q-learning network for an RL trading agent acting in a custom OpenAI Gym environment.

Notifications You must be signed in to change notification settings

asanth7/DeepQTrade

Repository files navigation

Deep and Ensemble Methods for Custom RL Trading

Data

Source code for robust ensemble methods and a deep convolution and LSTM-based Q-learning network for an RL trading agent acting in a custom OpenAI Gym environment, utilizing more than 20 economic indicators and price data to make daily trading decisions. In addition to standard input features for stock analysis (i.e. open, close, high, low prices, volume), this model incorporates data on a firm's GICS sector and sub-industry, news about sales, store openings, product recalls, layoffs, consumer sentiment scores, and more.

Other metrics include the signal line, the MACD (Moving Average Convergence Divergence) line and accompanying histogram, upper and lower Bollinger bands, ATR (Average True Range, which measures volatility), RSI (Relative Strength Index, which indicates the momentum and direction of price shifts over a specific time interval), and On-Balance Volume (utilizing data on trading volume to predict price movement).

Model Functionality

The custom trading environment maintains the same Buy/Sell actions, Long/Short positions of the StocksEnv environment from OpenAI and still accounts for trading fees. However, this custom environment features modifications of the step() and _process_data() methods in addition to a host of other helper methods and environment methods necessary for the computation and processing of various metrics. The trading agent takes advantage of more than 20 economic indicators in addition to adjusted closing price data for any given stock in the S&P 500. The Advantage Actor Critic (A2C), Proximal Policy Optimization (PPO) and Deep Q-Network (DQN) models are used in an majority voting ensemble method (individual predictions at each timestep are observed together and the most popular action is chosen). These models are analyzed independently and compared through custom callbacks that calculate the Sharpe Ratio (measuring average return in excess of the risk-free rate of return per unit of volatility) and Calmar Ratio (measures the average annual return relative to the maximum drawdown/loss).

Results

The custom deep network achieved peak training profit of almost 60% (training data comprised of approximately 200 timesteps/days), and slightly over 7% in testing (over a 3 week period), which rivals the state-of-the-art A2C, PPO, and DQN models. The custom deep network also seems to have greater long-term performance when compared to the standard models and the ensemble method. A peak test profit of almost 9% was observed over the 50 days following the training period.

This model achieved substantial improvements compared to short-term historical market averages (over similar time frames) of approximately 0.6% per month, as evidenced by the peak profit during testing.

Overview of File Structure

View dependencies (and their use in the project) and information on the stock sentiment dataset from Kaggle in the library_and_data_info.txt file. The required libraries and Kaggle data can be installed as follows:

pip install stable-baselines3 gym-anytrading gymnasium finta

mkdir -p ~/.kaggle
cp kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json
kaggle datasets download -d parsabg/stocknewseventssentiment-snes-10
unzip stocknewseventssentiment-snes-10.zip

To run the code, run the script in main_program.py.

The files calmar_ratio_callback.py and sharpe_ratio_callback.py contain code for the implementation of logged progress metrics during model training. These continually evaluate the Calmar and Sharpe ratios, respectively. Similarly, the profit_based_early_stopping.py implements early stopping during training to continually save the model with the highest profit.

View the file data_processing.py for insight on selecting and computing relevant model features, in addition to other modifications to the input data CSV.

The file ensemble_helpers.py contains functions for implementing the ensemble strategy and pooling predicted model actions for the trading decision by majority voting.

replay_memory.py creates a replay buffer class that allows the model to access prior experiences, allowing it to avoid training on temporally correlated experiences collected sequentially.

test_models_and_metrics.py provides a unified interface to train various models (ensemble, custom DQN, etc.) and presents detailed diagnostic information on model performance during and after training.

About

This repository includes ensemble methods and a deep Q-learning network for an RL trading agent acting in a custom OpenAI Gym environment.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages