This project explores the use of Long Short-Term Memory (LSTM) networks for predicting the hourly closing price of the Bitcoin (BTC/USDT pair), using historical data from the Binance exchange. A key focus of this work was investigating whether incorporating novel features derived from the Boids algorithm—a simulation of collective flocking behavior—could enhance prediction accuracy compared to using standard price-based features alone.
The project involved several stages:
- Extensive dataset generation (642 datasets) by varying Boids simulation parameters and historical data lengths (1k to 20k hourly records). The 20000-record set remained incomplete (126 out of 192 combinations missing) due to time constraints.
- Initial experiments to identify promising Boids parameter configurations using fixed LSTM settings.
- Evaluation of LSTM models trained with and without Boids features on a selected dataset (
LIMIT=10000, Boids Config:400, 100, 10, 150) using refined hyperparameters (LEARNING_RATE=5e-3,EPOCHS=3000).
The best result obtained in this study was for the model including Boids features, achieving a Mean Absolute Error (MAE) of $527.47 on the test set. A direct comparison with the model without Boids features under identical hyperparameters was planned but requires a separate execution.
This project was developed by Oleg Shchendrigin and Dmitrii Ryazanov at Innopolis University. (Initial team member Georgii Iakovlev did not contribute significantly to the final project).
The repository contains the core scripts, experiment utilities, and results directories.
- Base Features:
close: Closing price of the asset.ma_close: Simple Moving Average of the closing price.
- Boids Features (Used in
model.py): Generated byboids.py:boids_mean_x,boids_mean_y: Mean position of the flock.boids_mean_vx,boids_mean_vy: Mean velocity of the flock.boids_std_x,boids_std_y: Standard deviation of flock positions.boids_std_vx,boids_std_vy: Standard deviation of flock velocities.
- Data Loading/Generation: Fetch hourly BTC/USDT K-line data (Binance API via
load_data.py). If pre-processed data (CSV + JSON) for specific parameters isn't found indatasets/, generate it: calculate MA/future_close, run Boids simulation (boids.py), combine features, handle NaNs, scale features/target (MinMaxScaler), save files (load_or_generate_datafunction). - Sequence Creation: Transform scaled time series into sequences for LSTM (
create_sequences). - Data Splitting: Split chronologically into train/test sets (
train_test_split). - Model Training: Train LSTM (
LSTMModel) using Adam optimizer, MSE loss. Save best model checkpoint based on validation loss (train_model). Read hyperparameters from.env. - Evaluation: Load best model checkpoint. Make predictions on test set, denormalize, calculate MAE/RMSE (
evaluate_model). - Comparison: The framework allows comparison by running
model.py(with Boids) andnoboids.py(without Boids) using identical hyperparameters from.env.
1. Prerequisites:
- Python 3.x
- pip
2. Clone Repository:
git clone [https://github.com/Quartz-Admirer/NIC_final.git](https://github.com/Quartz-Admirer/NIC_final.git)
cd NIC_final3. Create and Populate requirements.txt:
Create a file named requirements.txt with the following content:
# requirements.txt
numpy
pandas
torch
matplotlib
scikit-learn
python-dotenv
requests
itertools4. Install Dependencies:
pip install -r requirements.txt5. Create .env File:
Create a file named .env in the project directory. Use the following template, adjusting parameters as needed (these reflect the final reported run):
# .env file
# Data Parameters (MUST match the dataset you want to load/generate)
BASE_DATA_PATH=datasets
DATA_LIMIT=10000
NUM_BOIDS=400
DIMENSION=100
MAX_SPEED=10
PERCEPTION_RADIUS=150
TARGET_COL=future_close
MA_WINDOW=50
# Model & Training Parameters (Final Tuned Parameters)
SEQ_LENGTH=20
TRAIN_RATIO=0.8
HIDDEN_DIM=64
NUM_LAYERS=2
LEARNING_RATE=5e-3
EPOCHS=3000
DEVICE=auto # Options: 'auto', 'cuda', 'cpu'
# Output & Logging Parameters
RESULTS_BASE_DIR=experiment_results_final
PLOT_LOSSES=True
PLOT_EVALUATION=TrueEnsure the .env file is configured correctly.
1. Run Training & Evaluation with Boids:
python model.py- Loads/generates data matching
.env. Trains using all features. Saves results (best model, plots, summary JSON) to a subdirectory inRESULTS_BASE_DIR(e.g.,..._with_boids/).
2. Run Training & Evaluation without Boids:
python noboids.py- Loads/generates the same base data file. Trains using only
close,ma_close. Saves results to a different subdirectory (e.g.,..._no_boids/).
3. Generate Multiple Datasets (Optional):
- Modify and run
experiments/datamore.pyto populatedatasets/.
4. Parse Experiment Results (Optional):
- Use
experiments/top_results_generate.pyto analyze summary files from initial experiments.
Pre-generated datasets from the initial parameter sweep (642 combinations) are available for download:
- Google Drive Link: https://drive.google.com/drive/folders/1Xgijksr7bPRmxFRowSb3J_VO0HQSZLT3?usp=sharing
The final evaluation focused on the DATA_LIMIT=10000 dataset with Boids parameters (num_boids=400, dimension=100, max_speed=10, perception_radius=150) and optimized LSTM hyperparameters (HIDDEN_DIM=64, NUM_LAYERS=2, LEARNING_RATE=5e-3, EPOCHS=3000).
- Model with Boids Features:
- Test MAE: 527.47
- Test RMSE: 809.32
- Model without Boids Features:
- Test MAE: (Result from
noboids.pyrun with identical final hyperparameters needed for comparison) - Test RMSE: (Result from
noboids.pyrun with identical final hyperparameters needed for comparison)
- Test MAE: (Result from
The best performance observed was achieved by the model including Boids features. A definitive conclusion on the benefit of Boids features requires the comparative results from the model trained without them under identical conditions. The achieved MAE of ~$527, while an improvement over initial non-tuned runs, suggests further refinement may be necessary for practical application.
- The comparative analysis between models with/without Boids under final tuned hyperparameters is incomplete.
- Hyperparameter tuning was performed manually; automated optimization could yield further improvements.
- The Boids simulation rules were simple and independent of market state.
- Evaluation relied primarily on MAE/RMSE. Directional accuracy and backtesting metrics were not explored.
- Comparison against strong statistical or machine learning baselines was not performed.
Future work should prioritize completing the comparative analysis, performing joint hyperparameter optimization, exploring advanced architectures (e.g., Attention, Transformers), enhancing Boids features, adding diverse feature sets, evaluating directional accuracy, and comparing against robust baselines.
- Oleg Shchendrigin
- Dmitrii Ryazanov
(Initial team member Georgii Iakovlev did not contribute significantly to the final project)
- Main Project Repository: https://github.com/Quartz-Admirer/NIC_final/tree/main
- Datasets on Google Drive: https://drive.google.com/drive/folders/1Xgijksr7bPRmxFRowSb3J_VO0HQSZLT3?usp=sharing