Predict short-term price direction from multi-ticker OHLCV data, generate trading signals, and backtest a simple equal-weight strategy.
- Python 3.9+ with pandas, numpy, scikit-learn, matplotlib, seaborn.
- Data/config:
market_data_ml.csv,tickers.csv,features_config.json,model_params.json.
- Train and evaluate models (uses configs automatically):
python train_model.py
Outputs land in outputs/ (metrics, predictions, confusion matrices, feature importance).
- Turn predictions into signals (example uses the best model from the default run):
python - <<\"PY\"
from signal_generator import generate_signals
generate_signals('outputs/predictions_RandomForestClassifier.csv', threshold=0.55)
PY
- Backtest the generated signals:
python - <<\"PY\"
from backtest import run_backtest
stats = run_backtest('outputs/signals_RandomForestClassifier.csv')
print(stats)
PY
Backtest saves equity curves (.csv and .png) plus metrics JSON under outputs/.
feature_engineering.py– builds lagged returns, SMA, RSI, MACD, and labels (directionfrom next-day return).train_model.py– trains Logistic Regression and Random Forest (optionally XGBoost if installed), cross-validates withTimeSeriesSplit, scores on a holdout set, and saves predictions/plots.signal_generator.py– converts predictions to long/flat (or long/short) signals and attaches strategy returns.backtest.py– simple equal-weight daily strategy vs. buy-and-hold baseline with equity curve plotting.tests/test_pipeline.py– unit coverage for feature creation, training output shapes, and signal/backtest plumbing.
- Feature selection and label name come from
features_config.json. - Model hyperparameters come from
model_params.json. - Change thresholds or allow shorts via
generate_signals(..., allow_short=True). - No transaction costs are included; signals use next-day returns derived during labeling.