S&P 500 Directional Prediction Model This project investigates short-horizon directional prediction of the S&P 500 using historical market data and supervised machine learning. The goal is not to construct a profitable trading system, but to examine what information from the recent past can meaningfully inform near-future market direction under proper time-series constraints.
Overview The model predicts whether the S&P 500 will move up or down over a fixed forward horizon using a rolling window of historical features (returns, momentum, volatility indicators, etc.).
Key constraints of the design:
- The model never uses future information relative to a prediction point
- All features are computed strictly from past data
- Evaluation is performed using chronological (rolling) train/test splits to reflect real-world deployment
Problem Setup -->Task: Binary classification (Up / Down) -->Target: Direction of the S&P 500 over a fixed forward window -->Input: Fixed-length rolling window of historical market features -->Evaluation: Out-of-sample test data occurring strictly after the training period
Data & Features -->Historical S&P 500 price data from YFinance
Feature set includes, but is not limited to: -->Lagged returns -->Moving averages -->Rolling volatility measures -->Momentum indicators
Feature windows are fixed in length and computed independently for each prediction timestamp, ensuring temporal causality is preserved.
Model
- Supervised classification model (RandomForest)
- Trained once on historical data
- Applied to future periods without retraining during evaluation
- Class imbalance can be altered for varied strategies via decision threshold adjustment This design emphasizes interpretability, reproducibility, and methodological correctness over model complexity.
Evaluation Methodology
To avoid look-ahead bias and data leakage, training and test sets are split chronologically. The model is evaluated only on future data it has never seen. Performance is reported on a held-out test period.
Representative Test Set Performance Accuracy: ~0.74 Weighted F1: ~0.73
Performance varies across market regimes, which is expected in non-stationary financial time series.
Interpretation of Results
The model performs meaningfully better than random guessing, suggesting that recent market structure contains limited but non-zero predictive signal. Performance is asymmetric across classes, reflecting class imbalance and regime-dependent behavior. Results should be interpreted as statistical signal detection, not as evidence of a consistently tradable edge.
Limitations
Financial markets are non-stationary; learned patterns may decay over time. No transaction costs, slippage, or risk management are modeled. Directional accuracy alone is insufficient for profitability. Results are sensitive to feature window length and market regime.
This project was built to:
- Practice proper time-series ML methodology
- Avoid common pitfalls such as shuffled validation and data leakage
- Explore the practical limits of short-term market predictability
- Serve as a clean, reproducible reference for financial ML experiments
Future Work:
- Expanding-window or walk-forward retraining
- Regime-aware or adaptive modeling
- Probabilistic calibration and uncertainty estimation
- Incorporation of macroeconomic or cross-asset signals
Disclaimer
This project is for educational and research purposes only. It does not constitute financial advice and should not be used for live trading.