LSTM-Based Bitcoin Price Forecasting with Boids Features

Introduction

This project explores the use of Long Short-Term Memory (LSTM) networks for predicting the hourly closing price of the Bitcoin (BTC/USDT pair), using historical data from the Binance exchange. A key focus of this work was investigating whether incorporating novel features derived from the Boids algorithm—a simulation of collective flocking behavior—could enhance prediction accuracy compared to using standard price-based features alone.

The project involved several stages:

Extensive dataset generation (642 datasets) by varying Boids simulation parameters and historical data lengths (1k to 20k hourly records). The 20000-record set remained incomplete (126 out of 192 combinations missing) due to time constraints.
Initial experiments to identify promising Boids parameter configurations using fixed LSTM settings.
Evaluation of LSTM models trained with and without Boids features on a selected dataset (LIMIT=10000, Boids Config: 400, 100, 10, 150) using refined hyperparameters (LEARNING_RATE=5e-3, EPOCHS=3000).

The best result obtained in this study was for the model including Boids features, achieving a Mean Absolute Error (MAE) of $527.47 on the test set. A direct comparison with the model without Boids features under identical hyperparameters was planned but requires a separate execution.

This project was developed by Oleg Shchendrigin and Dmitrii Ryazanov at Innopolis University. (Initial team member Georgii Iakovlev did not contribute significantly to the final project).

Project Structure

The repository contains the core scripts, experiment utilities, and results directories.

Features Used

Base Features:
- close: Closing price of the asset.
- ma_close: Simple Moving Average of the closing price.
Boids Features (Used in model.py): Generated by boids.py:
- boids_mean_x, boids_mean_y: Mean position of the flock.
- boids_mean_vx, boids_mean_vy: Mean velocity of the flock.
- boids_std_x, boids_std_y: Standard deviation of flock positions.
- boids_std_vx, boids_std_vy: Standard deviation of flock velocities.

Methodology Overview

Data Loading/Generation: Fetch hourly BTC/USDT K-line data (Binance API via load_data.py). If pre-processed data (CSV + JSON) for specific parameters isn't found in datasets/, generate it: calculate MA/future_close, run Boids simulation (boids.py), combine features, handle NaNs, scale features/target (MinMaxScaler), save files (load_or_generate_data function).
Sequence Creation: Transform scaled time series into sequences for LSTM (create_sequences).
Data Splitting: Split chronologically into train/test sets (train_test_split).
Model Training: Train LSTM (LSTMModel) using Adam optimizer, MSE loss. Save best model checkpoint based on validation loss (train_model). Read hyperparameters from .env.
Evaluation: Load best model checkpoint. Make predictions on test set, denormalize, calculate MAE/RMSE (evaluate_model).
Comparison: The framework allows comparison by running model.py (with Boids) and noboids.py (without Boids) using identical hyperparameters from .env.

Setup and Installation

1. Prerequisites:

Python 3.x
pip

2. Clone Repository:

git clone [https://github.com/Quartz-Admirer/NIC_final.git](https://github.com/Quartz-Admirer/NIC_final.git)
cd NIC_final

3. Create and Populate requirements.txt: Create a file named requirements.txt with the following content:

# requirements.txt
numpy
pandas
torch
matplotlib
scikit-learn
python-dotenv
requests
itertools

4. Install Dependencies:

pip install -r requirements.txt

5. Create .env File: Create a file named .env in the project directory. Use the following template, adjusting parameters as needed (these reflect the final reported run):

# .env file
# Data Parameters (MUST match the dataset you want to load/generate)
BASE_DATA_PATH=datasets
DATA_LIMIT=10000
NUM_BOIDS=400
DIMENSION=100
MAX_SPEED=10
PERCEPTION_RADIUS=150
TARGET_COL=future_close
MA_WINDOW=50

# Model & Training Parameters (Final Tuned Parameters)
SEQ_LENGTH=20
TRAIN_RATIO=0.8
HIDDEN_DIM=64
NUM_LAYERS=2
LEARNING_RATE=5e-3
EPOCHS=3000
DEVICE=auto # Options: 'auto', 'cuda', 'cpu'

# Output & Logging Parameters
RESULTS_BASE_DIR=experiment_results_final
PLOT_LOSSES=True
PLOT_EVALUATION=True

Usage

Ensure the .env file is configured correctly.

1. Run Training & Evaluation with Boids:

python model.py

Loads/generates data matching .env. Trains using all features. Saves results (best model, plots, summary JSON) to a subdirectory in RESULTS_BASE_DIR (e.g., ..._with_boids/).

2. Run Training & Evaluation without Boids:

python noboids.py

Loads/generates the same base data file. Trains using only close, ma_close. Saves results to a different subdirectory (e.g., ..._no_boids/).

3. Generate Multiple Datasets (Optional):

Modify and run experiments/datamore.py to populate datasets/.

4. Parse Experiment Results (Optional):

Use experiments/top_results_generate.py to analyze summary files from initial experiments.

Datasets

Pre-generated datasets from the initial parameter sweep (642 combinations) are available for download:

Google Drive Link: https://drive.google.com/drive/folders/1Xgijksr7bPRmxFRowSb3J_VO0HQSZLT3?usp=sharing

Results Summary

The final evaluation focused on the DATA_LIMIT=10000 dataset with Boids parameters (num_boids=400, dimension=100, max_speed=10, perception_radius=150) and optimized LSTM hyperparameters (HIDDEN_DIM=64, NUM_LAYERS=2, LEARNING_RATE=5e-3, EPOCHS=3000).

Model with Boids Features:
- Test MAE: 527.47
- Test RMSE: 809.32
Model without Boids Features:
- Test MAE: (Result from noboids.py run with identical final hyperparameters needed for comparison)
- Test RMSE: (Result from noboids.py run with identical final hyperparameters needed for comparison)

The best performance observed was achieved by the model including Boids features. A definitive conclusion on the benefit of Boids features requires the comparative results from the model trained without them under identical conditions. The achieved MAE of ~$527, while an improvement over initial non-tuned runs, suggests further refinement may be necessary for practical application.

Limitations and Future Work

The comparative analysis between models with/without Boids under final tuned hyperparameters is incomplete.
Hyperparameter tuning was performed manually; automated optimization could yield further improvements.
The Boids simulation rules were simple and independent of market state.
Evaluation relied primarily on MAE/RMSE. Directional accuracy and backtesting metrics were not explored.
Comparison against strong statistical or machine learning baselines was not performed.

Future work should prioritize completing the comparative analysis, performing joint hyperparameter optimization, exploring advanced architectures (e.g., Attention, Transformers), enhancing Boids features, adding diverse feature sets, evaluating directional accuracy, and comparing against robust baselines.

Authors

Oleg Shchendrigin
Dmitrii Ryazanov

(Initial team member Georgii Iakovlev did not contribute significantly to the final project)

Code Links

Main Project Repository: https://github.com/Quartz-Admirer/NIC_final/tree/main
Datasets on Google Drive: https://drive.google.com/drive/folders/1Xgijksr7bPRmxFRowSb3J_VO0HQSZLT3?usp=sharing

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
BTC_Prediction		BTC_Prediction
experiments		experiments
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LSTM-Based Bitcoin Price Forecasting with Boids Features

Introduction

Project Structure

Features Used

Methodology Overview

Setup and Installation

Usage

Datasets

Results Summary

Limitations and Future Work

Authors

Code Links

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LSTM-Based Bitcoin Price Forecasting with Boids Features

Introduction

Project Structure

Features Used

Methodology Overview

Setup and Installation

Usage

Datasets

Results Summary

Limitations and Future Work

Authors

Code Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages