Experiments and data preparation for LOB sequence classification using pre-split
walk-forward folds and a reference training notebook. Data are stored as
LZMA-compressed whitespace matrices (.txt.xz) and loaded into NumPy for
sliding-window construction.
data/- compressed training/testing folds (seedata/README.md).notebooks/- training and evaluation notebook (seenotebooks/README.md).scripts/- dataset utilities (seescripts/README.md).url.txt- dataset and reference links.
-
Decompress the dataset files:
bash scripts/unxz_data.sh
-
Open the notebook:
jupyter lab notebooks/train.ipynb
- Each fold is a 2D array
datawith shape(R, T)loaded vianp.loadtxt. - Columns are timesteps; rows are variables.
- Features:
X = data[:num_features, :].T(defaultnum_features=144). - Labels:
y = data[-horizon, :][seq_size - 1:] - 1. - Sliding window sample
ispans timesteps[i, i+seq_size-1]and usesy[i]. - Horizon mapping:
1 -> 100,2 -> 50,3 -> 30,4 -> 20,5 -> 10ticks.
See data/README.md for details on file names and folders.
The notebook uses standard scientific Python libraries. Install what you need for your environment, for example:
- Python 3.9+
- numpy, pandas, matplotlib, scikit-learn, tqdm
- torch
- jupyter (lab or notebook)
unxz(from xz-utils) for decompression
url.txt lists the dataset source and related papers or data format references.