This document explains the regression modeling techniques used in the Trade Simulator application for slippage estimation and maker/taker proportion prediction.
The Trade Simulator uses linear regression as the primary method for estimating slippage based on orderbook depth and trade size.
The model extracts the following features from the orderbook:
- Bid-ask spread
- Order book imbalance ratio
- Market depth at various price levels
- Volume-weighted price levels
- Trade size relative to available liquidity
The slippage model is trained through:
- Initial training with synthetic or historical data
- Online learning to adapt to changing market conditions
- Periodic retraining with recent market data
The linear regression approach provides:
- Low computational overhead
- Real-time predictions (<1ms)
- Accuracy within 5-10% of actual slippage under normal market conditions
For handling extreme market conditions and tail risks, a quantile regression model is also implemented.
- Provides prediction intervals rather than point estimates
- More robust to outliers and extreme market movements
- Better captures asymmetric slippage distributions
The maker/taker proportion is predicted using a logistic regression model, which estimates the probability of orders executing as maker vs. taker.
Features used for classification include:
- Bid-ask spread
- Order imbalance
- Recent trade volume
- Volatility
- Time of day
The model addresses several challenges:
- Class imbalance (synthetic data generation for underrepresented classes)
- Online adaptation to market regime changes
- Feature normalization for robust predictions
The maker/taker model achieves:
-
80% accuracy in typical market conditions
- <5ms prediction time
- Correct prediction of dominant execution type in >90% of cases
All regression models are continuously evaluated using:
- Mean squared error (MSE) for slippage prediction
- Classification accuracy for maker/taker prediction
- Computation time performance
- Memory usage efficiency
The models are implemented using:
- Scikit-learn for core regression algorithms
- NumPy for efficient numerical operations
- Custom feature extraction pipelines for orderbook data
- Online learning mechanisms for continuous improvement