Skip to content

adigunners/stock_picker_bot

Repository files navigation

Python Tests Pipeline Gemini FastAPI License

Stock Picker Bot

AI-powered quantitative stock screening engine that identifies investment candidates through a systematic 6-stage pipeline.

The kind of systematic screening pipeline that quant desks use — universe construction, momentum filtering, fundamental quality checks, AI-driven earnings detection, risk assessment, and position sizing — built as a single-command workflow. Emphasizes data engineering discipline, production-grade error handling with zero filtering failures, and real capital allocation decisions backed by a bi-weekly production cycle.

Key Metrics

Metric Value Metric Value
Pipeline stages 6 Automated tests 176+
Data sources 3 (Alpha Vantage, Yahoo, Gemini) Market regimes 3 (Bull/Defensive/Neutral)
Screening time 30-60 seconds Test execution <10 seconds

Pipeline Architecture

graph TD
    subgraph "Stage 1 — Universe"
        A["S&P 500 Selection<br/><i>~400-500 qualified</i>"]
    end

    subgraph "Stage 2 — Momentum"
        B["Technical Filtering<br/><i>SMA, RSI, ROC</i><br/>~150-250 stocks"]
    end

    subgraph "Stage 3 — Quality"
        C["Fundamental Screening<br/><i>completeness + sector-aware validation</i><br/>~50-100 stocks"]
    end

    subgraph "Stage 4 — AI Analysis"
        D["Gemini 2.5 Flash<br/><i>earnings detection + PEAD avoidance</i><br/>~30-50 stocks"]
    end

    subgraph "Stage 5 — Risk"
        E["Risk Assessment<br/><i>volatility, beta, Sharpe</i><br/>~15-25 stocks"]
    end

    subgraph "Stage 6 — Portfolio"
        F["Position Sizing<br/><i>capital allocation</i><br/>8-12 final picks"]
    end

    A --> B --> C --> D --> E --> F

    style A fill:#3F51B5,color:#fff
    style B fill:#5C6BC0,color:#fff
    style C fill:#7C4DFF,color:#fff
    style D fill:#9C27B0,color:#fff
    style E fill:#7B1FA2,color:#fff
    style F fill:#4A148C,color:#fff
Loading
graph LR
    subgraph Market Regime Adaptation
        MR{{"Regime Detection"}}
        BULL["BULL<br/>Growth-focused<br/>12 max positions<br/>β 0.8-1.5"]
        DEF["DEFENSIVE<br/>Quality-focused<br/>8 max positions<br/>β 0.5-1.0"]
        NEU["NEUTRAL<br/>Balanced<br/>10 max positions<br/>β 0.7-1.3"]
    end

    MR -->|"S&P above SMA"| BULL
    MR -->|"S&P below SMA"| DEF
    MR -->|"Mixed signals"| NEU

    style BULL fill:#2E7D32,color:#fff
    style DEF fill:#C62828,color:#fff
    style NEU fill:#F57F17,color:#000
Loading
graph LR
    subgraph Data Sources
        AV["Alpha Vantage Premium<br/><i>$50/mo — 75 req/min</i><br/>Fundamentals, Earnings,<br/>Balance Sheet, Cash Flow"]
        YF["Yahoo Finance<br/><i>Free — 99.6% success</i><br/>Price Data"]
        GM["Google Gemini 2.5 Flash<br/><i>~$0.01-0.05/cycle</i><br/>Earnings Detection"]
    end

    subgraph Engine
        E["Screening Engine"]
    end

    AV -->|"primary"| E
    YF -->|"fallback"| E
    GM -->|"AI layer"| E

    style AV fill:#283593,color:#fff
    style YF fill:#5C6BC0,color:#fff
    style GM fill:#9C27B0,color:#fff
Loading

Features

6-Stage Screening Pipeline — From S&P 500 universe construction through momentum, quality, AI analysis, risk assessment, to final position sizing with capital allocation

Zero Filtering Failures — Graceful degradation with automatic fallback calculations for FCF, ROE, PB ratio, and EPS growth when primary data is unavailable

AI Earnings Detection — Google Gemini 2.5 Flash with search grounding identifies stocks with recent earnings announcements, avoiding post-earnings announcement drift (PEAD)

Market Regime Adaptation — Automatically shifts between BULL, DEFENSIVE, and NEUTRAL strategies based on market conditions, adjusting position limits, beta ranges, and sector tilts

3-Tier Testing Infrastructure — 176+ tests: zero-cost mocked unit tests, snapshot-based integration tests, and live API production validation — all under 10 seconds

Bi-Weekly Production Cycle — Day 1 screening, Day 5 revalidation with momentum checks, Day 10 performance reporting with eToro CSV import

Getting Started

Prerequisites

  • Python 3.11+
  • Alpha Vantage Premium API key ($50/month — primary data source)
  • Gemini API key (optional — for earnings detection)

Installation

git clone https://github.com/adigunners/stock_picker_bot.git
cd stock_picker_bot/backend

python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

cp .env.production .env
# Edit .env: add ALPHAVANTAGE_API_KEY and optionally GEMINI_API_KEY

alembic upgrade head

# Verify installation
TEST_MODE=true pytest
# Expected: 176+ tests passing

First Run

# Health check
python -m app.cli health

# Audit data quality across S&P 500
python -m app.cli audit fundamentals

# Run stock screening with $10,000 capital
python -m app.cli screen --capital 10000

# Day 5 momentum revalidation
python -m app.cli day5-check

# Import eToro trades
python -m app.cli import-etoro --file "eToro account summary.csv"

# Generate performance report
python -m app.cli report performance

Architecture & Design Decisions

Why a 6-Stage Pipeline?

Each stage is a deliberate filter that reduces the universe progressively:

  1. Universe Construction — Start with S&P 500 (~500 stocks) for liquidity and data availability
  2. Momentum Filtering — SMA crossovers, RSI, Rate of Change eliminate stocks without positive technical signals
  3. Quality Screening — Fundamental completeness checks with sector-aware rules (Financials, REITs, Utilities have different requirements)
  4. AI Analysis — Gemini 2.5 Flash detects recent earnings announcements to avoid PEAD (post-earnings announcement drift)
  5. Risk Assessment — Volatility, beta, and Sharpe ratio scoring against market-regime-specific thresholds
  6. Position Sizing — Portfolio optimization with capital allocation, respecting regime-specific position limits

Key Design Decisions

  • Alpha Vantage as primary source — Consistent, structured fundamental data across all endpoints; Yahoo Finance as price-only fallback
  • Sector-aware validation — Financials don't have FCF; REITs have different PB norms; Utilities have lower growth — hardcoded rules per GICS sector
  • Zero-cost testing — TEST_MODE mocks all API calls, enabling 176+ tests to run in <10s with no API spend
  • Snapshot-based integration testing — Real data captured once, replayed deterministically, so integration tests have zero ongoing API cost
  • SQLite over PostgreSQL — Single-user system; SQLite with SQLAlchemy ORM provides full relational capabilities without server overhead

For detailed architecture documentation, see docs/architecture/.

Data Quality & Fallback Strategy

The engine guarantees zero filtering failures through a layered fallback system:

Metric Primary Source Fallback Calculation Sector Exception
Free Cash Flow Alpha Vantage Cash Flow Operating CF − CapEx Financials: skip
ROE Alpha Vantage Overview Net Income / Shareholder Equity
PB Ratio Alpha Vantage Overview Market Cap / Book Value REITs: relaxed threshold
EPS Growth Alpha Vantage Earnings QoQ earnings comparison
Price Data Yahoo Finance (99.6%) Stooq

Cache Strategy

Data Type TTL Rationale
Fundamentals 7 days Quarterly reports; daily refresh unnecessary
Prices 48 hours Need recent but not real-time
Metadata 30 days Sector, name, exchange rarely change

Audit CLI

python -m app.cli audit fundamentals

Scans the entire S&P 500 universe and reports missing fields, stale cache entries, and sector-specific validation gaps.

Testing Philosophy — 3-Tier Infrastructure

Tier 1: Unit Tests (zero cost, <10 seconds)

  • TEST_MODE=true mocks all API calls (Gemini, Alpha Vantage, Yahoo Finance)
  • In-memory SQLite database
  • 176+ tests covering all core logic
  • Run on every commit via GitHub Actions CI

Tier 2: Integration Tests (snapshot-based, zero ongoing cost)

  • Real data snapshots captured from live APIs (one-time cost)
  • Stored in data/snapshots/ for deterministic replay
  • Validates end-to-end screening workflows against known data
  • Cycles 5-9 preserved for backtesting

Tier 3: Production Validation (live API, cost-tracked)

  • Real API calls for final pre-deployment validation
  • API cost monitoring with per-provider budgets
  • Periodic smoke tests against live data
# Run all tests (zero cost)
TEST_MODE=true pytest

# Run with coverage report
pytest --cov=app --cov-report=html

# Run specific module
pytest tests/filtering/ -v
pytest tests/api/ -v
pytest tests/data_providers/ -v
Bi-Weekly Production Cycle
Day Phase Actions
Day 1 Screen Data refresh → Run pipeline → Select positions → Execute trades
Day 5 Revalidate Momentum check → Trim 50% if below 20-day SMA → Full exit if below 50-day SMA
Day 10 Report Import eToro CSV → Generate performance report → Log cycle results

Pre-Flight Checklist (Day 1)

  • Alpha Vantage Premium API key active
  • Database initialized (alembic upgrade head)
  • All tests passing (TEST_MODE=true pytest)
  • S&P 500 universe current
  • Cache directory has space (500MB+)

For detailed workflows, see docs/workflows/biweekly-cycle.md and docs/workflows/preflight-checklist.md.

API Configuration & Costs

Alpha Vantage Premium (Primary)

  • Cost: $50/month
  • Rate Limit: 75 requests/minute, no daily limit
  • Endpoints: COMPANY_OVERVIEW, EARNINGS, BALANCE_SHEET, INCOME_STATEMENT, CASH_FLOW
  • Usage: All fundamental data, earnings history, financial statements

Yahoo Finance (Fallback)

  • Cost: Free
  • Rate Limit: Unofficial, ~99.6% success rate
  • Usage: Price data only (OHLCV)

Google Gemini (Optional)

  • Model: Gemini 2.5 Flash with search grounding
  • Cost: ~$0.01-0.05 per screening cycle
  • Usage: Real-time earnings announcement detection (3-day lookback)

Performance

Metric Value
Full screening cycle 30-60 seconds
Test suite execution <10 seconds
Cache footprint ~50MB
Database size ~500KB
Memory usage <500MB
Tech Stack
Component Technology Purpose
Language Python 3.11+ Core runtime
API Framework FastAPI REST endpoints
Database SQLite + SQLAlchemy + Alembic Storage, ORM, migrations
Data Processing pandas, numpy Numerical analysis
AI/LLM Google Gemini 2.5 Flash Earnings detection with search grounding
Primary Data Alpha Vantage Premium Fundamentals, earnings, financials
Price Data Yahoo Finance OHLCV price history
Testing pytest (176+ tests) 3-tier test infrastructure
Code Quality black, ruff, mypy Formatting, linting, type checking
CI/CD GitHub Actions Automated formatting and lint checks
CSV Import Custom eToro parser EUR→USD conversion, date formatting
Project Structure
stock_picker_bot/
├── backend/
│   ├── app/
│   │   ├── api/              # FastAPI REST endpoints
│   │   ├── cli/              # CLI commands (screen, audit, health, day5-check)
│   │   ├── data/             # Data providers and validators
│   │   ├── database/         # SQLAlchemy models and repository
│   │   ├── engine/           # Screening engine
│   │   │   ├── calculations/ # Metric calculations
│   │   │   ├── filters/      # Pipeline filter stages
│   │   │   ├── scorers/      # Stock scoring logic
│   │   │   └── validators/   # Data validation
│   │   ├── llm/              # Gemini integration
│   │   ├── models/           # Data models
│   │   ├── parsers/          # eToro CSV parser
│   │   ├── services/         # Business logic
│   │   └── utils/            # Utilities
│   ├── tests/                # 44 test files
│   │   ├── api/              # Endpoint tests
│   │   ├── filtering/        # Filter and scorer tests
│   │   ├── data_providers/   # Data provider tests
│   │   ├── parsers/          # Parser tests
│   │   └── performance/      # Performance calculation tests
│   ├── alembic/              # Database migrations
│   └── config/               # Configuration
├── data/                     # gitignored
│   ├── db/                   # SQLite databases
│   ├── cache/                # API response cache (fundamentals/prices/metadata)
│   ├── snapshots/            # Test data snapshots
│   ├── exports/              # Screening results
│   └── imports/              # eToro CSV imports
├── docs/
│   ├── architecture/         # System design, decisions, engine overview
│   ├── workflows/            # Bi-weekly cycle, preflight, error recovery
│   ├── testing/              # Test execution, coverage, data strategy
│   ├── data-flow/            # Caching, enrichment, sector processing
│   ├── guides/               # Data management, enrichment quick start
│   ├── setup/                # Environment setup
│   └── performance/          # Benchmarks and feedback
└── README.md
Documentation Index
Category Document Description
Workflows Bi-Weekly Cycle Day 1/5/10 production workflow
Pre-Flight Checklist Pre-cycle validation
Error Recovery Common failures and fixes
Architecture System Architecture 6-stage pipeline, regime detection
Design Decisions Key architectural rationale
Engine Overview Screening engine deep dive
Data Multi-Source Enrichment Data provider hierarchy
Sector-Aware Processing Sector-specific rules
Caching Strategy Cache TTLs and management
Testing Test Execution Guide Running tests
Data Strategy TEST_MODE, snapshots, cost tracking
Coverage Guide Coverage reports
Setup Environment Setup Development environment
Data Management Cache and data operations

Built with Claude Code as the primary development partner — architecture, implementation, and testing developed through an agentic AI methodology.

Version: v2.2 | MIT License — see LICENSE file.

About

AI-powered quantitative stock screening engine. 6-stage pipeline with LLM-driven earnings detection, market regime adaptation, and 176+ automated tests.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages