AI-powered quantitative stock screening engine that identifies investment candidates through a systematic 6-stage pipeline.
The kind of systematic screening pipeline that quant desks use — universe construction, momentum filtering, fundamental quality checks, AI-driven earnings detection, risk assessment, and position sizing — built as a single-command workflow. Emphasizes data engineering discipline, production-grade error handling with zero filtering failures, and real capital allocation decisions backed by a bi-weekly production cycle.
| Metric | Value | Metric | Value |
|---|---|---|---|
| Pipeline stages | 6 | Automated tests | 176+ |
| Data sources | 3 (Alpha Vantage, Yahoo, Gemini) | Market regimes | 3 (Bull/Defensive/Neutral) |
| Screening time | 30-60 seconds | Test execution | <10 seconds |
graph TD
subgraph "Stage 1 — Universe"
A["S&P 500 Selection<br/><i>~400-500 qualified</i>"]
end
subgraph "Stage 2 — Momentum"
B["Technical Filtering<br/><i>SMA, RSI, ROC</i><br/>~150-250 stocks"]
end
subgraph "Stage 3 — Quality"
C["Fundamental Screening<br/><i>completeness + sector-aware validation</i><br/>~50-100 stocks"]
end
subgraph "Stage 4 — AI Analysis"
D["Gemini 2.5 Flash<br/><i>earnings detection + PEAD avoidance</i><br/>~30-50 stocks"]
end
subgraph "Stage 5 — Risk"
E["Risk Assessment<br/><i>volatility, beta, Sharpe</i><br/>~15-25 stocks"]
end
subgraph "Stage 6 — Portfolio"
F["Position Sizing<br/><i>capital allocation</i><br/>8-12 final picks"]
end
A --> B --> C --> D --> E --> F
style A fill:#3F51B5,color:#fff
style B fill:#5C6BC0,color:#fff
style C fill:#7C4DFF,color:#fff
style D fill:#9C27B0,color:#fff
style E fill:#7B1FA2,color:#fff
style F fill:#4A148C,color:#fff
graph LR
subgraph Market Regime Adaptation
MR{{"Regime Detection"}}
BULL["BULL<br/>Growth-focused<br/>12 max positions<br/>β 0.8-1.5"]
DEF["DEFENSIVE<br/>Quality-focused<br/>8 max positions<br/>β 0.5-1.0"]
NEU["NEUTRAL<br/>Balanced<br/>10 max positions<br/>β 0.7-1.3"]
end
MR -->|"S&P above SMA"| BULL
MR -->|"S&P below SMA"| DEF
MR -->|"Mixed signals"| NEU
style BULL fill:#2E7D32,color:#fff
style DEF fill:#C62828,color:#fff
style NEU fill:#F57F17,color:#000
graph LR
subgraph Data Sources
AV["Alpha Vantage Premium<br/><i>$50/mo — 75 req/min</i><br/>Fundamentals, Earnings,<br/>Balance Sheet, Cash Flow"]
YF["Yahoo Finance<br/><i>Free — 99.6% success</i><br/>Price Data"]
GM["Google Gemini 2.5 Flash<br/><i>~$0.01-0.05/cycle</i><br/>Earnings Detection"]
end
subgraph Engine
E["Screening Engine"]
end
AV -->|"primary"| E
YF -->|"fallback"| E
GM -->|"AI layer"| E
style AV fill:#283593,color:#fff
style YF fill:#5C6BC0,color:#fff
style GM fill:#9C27B0,color:#fff
6-Stage Screening Pipeline — From S&P 500 universe construction through momentum, quality, AI analysis, risk assessment, to final position sizing with capital allocation
Zero Filtering Failures — Graceful degradation with automatic fallback calculations for FCF, ROE, PB ratio, and EPS growth when primary data is unavailable
AI Earnings Detection — Google Gemini 2.5 Flash with search grounding identifies stocks with recent earnings announcements, avoiding post-earnings announcement drift (PEAD)
Market Regime Adaptation — Automatically shifts between BULL, DEFENSIVE, and NEUTRAL strategies based on market conditions, adjusting position limits, beta ranges, and sector tilts
3-Tier Testing Infrastructure — 176+ tests: zero-cost mocked unit tests, snapshot-based integration tests, and live API production validation — all under 10 seconds
Bi-Weekly Production Cycle — Day 1 screening, Day 5 revalidation with momentum checks, Day 10 performance reporting with eToro CSV import
- Python 3.11+
- Alpha Vantage Premium API key ($50/month — primary data source)
- Gemini API key (optional — for earnings detection)
git clone https://github.com/adigunners/stock_picker_bot.git
cd stock_picker_bot/backend
python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.production .env
# Edit .env: add ALPHAVANTAGE_API_KEY and optionally GEMINI_API_KEY
alembic upgrade head
# Verify installation
TEST_MODE=true pytest
# Expected: 176+ tests passing# Health check
python -m app.cli health
# Audit data quality across S&P 500
python -m app.cli audit fundamentals
# Run stock screening with $10,000 capital
python -m app.cli screen --capital 10000
# Day 5 momentum revalidation
python -m app.cli day5-check
# Import eToro trades
python -m app.cli import-etoro --file "eToro account summary.csv"
# Generate performance report
python -m app.cli report performanceArchitecture & Design Decisions
Each stage is a deliberate filter that reduces the universe progressively:
- Universe Construction — Start with S&P 500 (~500 stocks) for liquidity and data availability
- Momentum Filtering — SMA crossovers, RSI, Rate of Change eliminate stocks without positive technical signals
- Quality Screening — Fundamental completeness checks with sector-aware rules (Financials, REITs, Utilities have different requirements)
- AI Analysis — Gemini 2.5 Flash detects recent earnings announcements to avoid PEAD (post-earnings announcement drift)
- Risk Assessment — Volatility, beta, and Sharpe ratio scoring against market-regime-specific thresholds
- Position Sizing — Portfolio optimization with capital allocation, respecting regime-specific position limits
- Alpha Vantage as primary source — Consistent, structured fundamental data across all endpoints; Yahoo Finance as price-only fallback
- Sector-aware validation — Financials don't have FCF; REITs have different PB norms; Utilities have lower growth — hardcoded rules per GICS sector
- Zero-cost testing — TEST_MODE mocks all API calls, enabling 176+ tests to run in <10s with no API spend
- Snapshot-based integration testing — Real data captured once, replayed deterministically, so integration tests have zero ongoing API cost
- SQLite over PostgreSQL — Single-user system; SQLite with SQLAlchemy ORM provides full relational capabilities without server overhead
For detailed architecture documentation, see docs/architecture/.
Data Quality & Fallback Strategy
The engine guarantees zero filtering failures through a layered fallback system:
| Metric | Primary Source | Fallback Calculation | Sector Exception |
|---|---|---|---|
| Free Cash Flow | Alpha Vantage Cash Flow | Operating CF − CapEx | Financials: skip |
| ROE | Alpha Vantage Overview | Net Income / Shareholder Equity | — |
| PB Ratio | Alpha Vantage Overview | Market Cap / Book Value | REITs: relaxed threshold |
| EPS Growth | Alpha Vantage Earnings | QoQ earnings comparison | — |
| Price Data | Yahoo Finance (99.6%) | Stooq | — |
| Data Type | TTL | Rationale |
|---|---|---|
| Fundamentals | 7 days | Quarterly reports; daily refresh unnecessary |
| Prices | 48 hours | Need recent but not real-time |
| Metadata | 30 days | Sector, name, exchange rarely change |
python -m app.cli audit fundamentalsScans the entire S&P 500 universe and reports missing fields, stale cache entries, and sector-specific validation gaps.
Testing Philosophy — 3-Tier Infrastructure
TEST_MODE=truemocks all API calls (Gemini, Alpha Vantage, Yahoo Finance)- In-memory SQLite database
- 176+ tests covering all core logic
- Run on every commit via GitHub Actions CI
- Real data snapshots captured from live APIs (one-time cost)
- Stored in
data/snapshots/for deterministic replay - Validates end-to-end screening workflows against known data
- Cycles 5-9 preserved for backtesting
- Real API calls for final pre-deployment validation
- API cost monitoring with per-provider budgets
- Periodic smoke tests against live data
# Run all tests (zero cost)
TEST_MODE=true pytest
# Run with coverage report
pytest --cov=app --cov-report=html
# Run specific module
pytest tests/filtering/ -v
pytest tests/api/ -v
pytest tests/data_providers/ -vBi-Weekly Production Cycle
| Day | Phase | Actions |
|---|---|---|
| Day 1 | Screen | Data refresh → Run pipeline → Select positions → Execute trades |
| Day 5 | Revalidate | Momentum check → Trim 50% if below 20-day SMA → Full exit if below 50-day SMA |
| Day 10 | Report | Import eToro CSV → Generate performance report → Log cycle results |
- Alpha Vantage Premium API key active
- Database initialized (
alembic upgrade head) - All tests passing (
TEST_MODE=true pytest) - S&P 500 universe current
- Cache directory has space (500MB+)
For detailed workflows, see docs/workflows/biweekly-cycle.md and docs/workflows/preflight-checklist.md.
API Configuration & Costs
- Cost: $50/month
- Rate Limit: 75 requests/minute, no daily limit
- Endpoints:
COMPANY_OVERVIEW,EARNINGS,BALANCE_SHEET,INCOME_STATEMENT,CASH_FLOW - Usage: All fundamental data, earnings history, financial statements
- Cost: Free
- Rate Limit: Unofficial, ~99.6% success rate
- Usage: Price data only (OHLCV)
- Model: Gemini 2.5 Flash with search grounding
- Cost: ~$0.01-0.05 per screening cycle
- Usage: Real-time earnings announcement detection (3-day lookback)
| Metric | Value |
|---|---|
| Full screening cycle | 30-60 seconds |
| Test suite execution | <10 seconds |
| Cache footprint | ~50MB |
| Database size | ~500KB |
| Memory usage | <500MB |
Tech Stack
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3.11+ | Core runtime |
| API Framework | FastAPI | REST endpoints |
| Database | SQLite + SQLAlchemy + Alembic | Storage, ORM, migrations |
| Data Processing | pandas, numpy | Numerical analysis |
| AI/LLM | Google Gemini 2.5 Flash | Earnings detection with search grounding |
| Primary Data | Alpha Vantage Premium | Fundamentals, earnings, financials |
| Price Data | Yahoo Finance | OHLCV price history |
| Testing | pytest (176+ tests) | 3-tier test infrastructure |
| Code Quality | black, ruff, mypy | Formatting, linting, type checking |
| CI/CD | GitHub Actions | Automated formatting and lint checks |
| CSV Import | Custom eToro parser | EUR→USD conversion, date formatting |
Project Structure
stock_picker_bot/
├── backend/
│ ├── app/
│ │ ├── api/ # FastAPI REST endpoints
│ │ ├── cli/ # CLI commands (screen, audit, health, day5-check)
│ │ ├── data/ # Data providers and validators
│ │ ├── database/ # SQLAlchemy models and repository
│ │ ├── engine/ # Screening engine
│ │ │ ├── calculations/ # Metric calculations
│ │ │ ├── filters/ # Pipeline filter stages
│ │ │ ├── scorers/ # Stock scoring logic
│ │ │ └── validators/ # Data validation
│ │ ├── llm/ # Gemini integration
│ │ ├── models/ # Data models
│ │ ├── parsers/ # eToro CSV parser
│ │ ├── services/ # Business logic
│ │ └── utils/ # Utilities
│ ├── tests/ # 44 test files
│ │ ├── api/ # Endpoint tests
│ │ ├── filtering/ # Filter and scorer tests
│ │ ├── data_providers/ # Data provider tests
│ │ ├── parsers/ # Parser tests
│ │ └── performance/ # Performance calculation tests
│ ├── alembic/ # Database migrations
│ └── config/ # Configuration
├── data/ # gitignored
│ ├── db/ # SQLite databases
│ ├── cache/ # API response cache (fundamentals/prices/metadata)
│ ├── snapshots/ # Test data snapshots
│ ├── exports/ # Screening results
│ └── imports/ # eToro CSV imports
├── docs/
│ ├── architecture/ # System design, decisions, engine overview
│ ├── workflows/ # Bi-weekly cycle, preflight, error recovery
│ ├── testing/ # Test execution, coverage, data strategy
│ ├── data-flow/ # Caching, enrichment, sector processing
│ ├── guides/ # Data management, enrichment quick start
│ ├── setup/ # Environment setup
│ └── performance/ # Benchmarks and feedback
└── README.md
Documentation Index
| Category | Document | Description |
|---|---|---|
| Workflows | Bi-Weekly Cycle | Day 1/5/10 production workflow |
| Pre-Flight Checklist | Pre-cycle validation | |
| Error Recovery | Common failures and fixes | |
| Architecture | System Architecture | 6-stage pipeline, regime detection |
| Design Decisions | Key architectural rationale | |
| Engine Overview | Screening engine deep dive | |
| Data | Multi-Source Enrichment | Data provider hierarchy |
| Sector-Aware Processing | Sector-specific rules | |
| Caching Strategy | Cache TTLs and management | |
| Testing | Test Execution Guide | Running tests |
| Data Strategy | TEST_MODE, snapshots, cost tracking | |
| Coverage Guide | Coverage reports | |
| Setup | Environment Setup | Development environment |
| Data Management | Cache and data operations |
Built with Claude Code as the primary development partner — architecture, implementation, and testing developed through an agentic AI methodology.
Version: v2.2 | MIT License — see LICENSE file.