An end-to-end agentic data engineering and ML project that ingests market data, detects macro regimes using a Hidden Markov Model, and generates daily market intelligence digests using an LLM agent.
- Ingest — Pulls daily OHLCV data for SPY, QQQ, GLD, TLT, and VIX from Yahoo Finance into a local DuckDB warehouse
- Feature Engineering — SQL-based transforms: rolling returns, realized volatility, RSI, and momentum
- ML Regime Detection — A Gaussian Hidden Markov Model (HMM) classifies the market into 4 unsupervised regimes
- Agentic AI Layer — A Groq-powered LLM agent reasons over the current regime and recent price action to generate a daily market digest
- Dashboard — A live Streamlit dashboard visualizes regimes, features, and the AI digest
- Automation — GitHub Actions runs the full pipeline every weekday at 9am EST
market-regime/
├── .github/
│ └── workflows/
│ └── daily_pipeline.yml # Automated daily pipeline
├── db/
│ ├── market.duckdb # DuckDB warehouse
│ ├── hmm_model.pkl # Trained HMM model
│ ├── regimes_plot.png # Regime visualization
│ └── latest_digest.txt # Latest AI digest
├── ingest/
│ ├── fetch_prices.py # Data ingestion from Yahoo Finance
│ ├── train_regimes.py # HMM model training
│ ├── save_regimes.py # Persist regime labels to DuckDB
│ ├── plot_regimes.py # Regime visualization
│ └── agent.py # LLM agent digest generator
├── sql/
│ ├── schema.sql # DuckDB schema
│ └── features.sql # SQL feature engineering views
├── dashboard.py # Streamlit dashboard
├── requirements.txt
└── README.md
The HMM identifies 4 distinct market regimes from unsupervised learning:
| Regime | Volatility | Momentum | Description |
|---|---|---|---|
| 🟢 Bull | ~10% | Strong positive | Low vol, clear uptrend |
| 🟠 Choppy | ~16% | Flat | Moderate vol, no clear direction |
| 🔴 Bear/Recovery | ~22% | Negative | High vol, below 200d MA |
| 🟣 Crisis | ~58% | Deeply negative | Extreme vol, major drawdown |
| Layer | Tool |
|---|---|
| Data warehouse | DuckDB |
| Data ingestion | yfinance + Python |
| Feature engineering | SQL window functions |
| ML model | hmmlearn (Gaussian HMM) |
| LLM agent | Groq API (Llama 3.3 70B) |
| Dashboard | Streamlit + Plotly |
| Orchestration | GitHub Actions |
| Hosting | Streamlit Community Cloud |
git clone https://github.com/sharon2719/market-regime.git
cd market-regimepython -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # Mac/Linux
pip install -r requirements.txtCreate a .env file in the project root:
GROQ_API_KEY=gsk_...
python ingest/fetch_prices.py # Fetch market data
python ingest/train_regimes.py # Train HMM model
python ingest/save_regimes.py # Save regime labels
python ingest/agent.py # Generate AI digest
python -m streamlit run dashboard.py # Launch dashboardThe GitHub Actions workflow runs every weekday at 9am EST and:
- Fetches latest market data
- Updates regime labels
- Generates a new AI digest
- Commits updated data back to the repo
The live dashboard reflects the latest data automatically.
All features are computed in SQL using DuckDB window functions:
- Daily / 5d / 20d / 60d log returns
- 20d and 60d realized volatility (annualized)
- 50d and 200d momentum (% distance from moving average)
The LLM agent receives the current regime, confidence score, and the last 5 days of features, and produces:
- Regime Interpretation — what the regime means historically
- Risk Stance — risk-on or risk-off with reasoning
- 3 Things to Watch — specific indicators and levels
- Suggested Positioning — broad asset allocation thoughts
MIT