Stock Picker Bot

AI-powered quantitative stock screening engine that identifies investment candidates through a systematic 6-stage pipeline.

The kind of systematic screening pipeline that quant desks use — universe construction, momentum filtering, fundamental quality checks, AI-driven earnings detection, risk assessment, and position sizing — built as a single-command workflow. Emphasizes data engineering discipline, production-grade error handling with zero filtering failures, and real capital allocation decisions backed by a bi-weekly production cycle.

Key Metrics

Metric	Value	Metric	Value
Pipeline stages	6	Automated tests	176+
Data sources	3 (Alpha Vantage, Yahoo, Gemini)	Market regimes	3 (Bull/Defensive/Neutral)
Screening time	30-60 seconds	Test execution	<10 seconds

Pipeline Architecture

graph TD
    subgraph "Stage 1 — Universe"
        A["S&P 500 Selection<br/><i>~400-500 qualified</i>"]
    end

    subgraph "Stage 2 — Momentum"
        B["Technical Filtering<br/><i>SMA, RSI, ROC</i><br/>~150-250 stocks"]
    end

    subgraph "Stage 3 — Quality"
        C["Fundamental Screening<br/><i>completeness + sector-aware validation</i><br/>~50-100 stocks"]
    end

    subgraph "Stage 4 — AI Analysis"
        D["Gemini 2.5 Flash<br/><i>earnings detection + PEAD avoidance</i><br/>~30-50 stocks"]
    end

    subgraph "Stage 5 — Risk"
        E["Risk Assessment<br/><i>volatility, beta, Sharpe</i><br/>~15-25 stocks"]
    end

    subgraph "Stage 6 — Portfolio"
        F["Position Sizing<br/><i>capital allocation</i><br/>8-12 final picks"]
    end

    A --> B --> C --> D --> E --> F

    style A fill:#3F51B5,color:#fff
    style B fill:#5C6BC0,color:#fff
    style C fill:#7C4DFF,color:#fff
    style D fill:#9C27B0,color:#fff
    style E fill:#7B1FA2,color:#fff
    style F fill:#4A148C,color:#fff

graph LR
    subgraph Market Regime Adaptation
        MR{{"Regime Detection"}}
        BULL["BULL<br/>Growth-focused<br/>12 max positions<br/>β 0.8-1.5"]
        DEF["DEFENSIVE<br/>Quality-focused<br/>8 max positions<br/>β 0.5-1.0"]
        NEU["NEUTRAL<br/>Balanced<br/>10 max positions<br/>β 0.7-1.3"]
    end

    MR -->|"S&P above SMA"| BULL
    MR -->|"S&P below SMA"| DEF
    MR -->|"Mixed signals"| NEU

    style BULL fill:#2E7D32,color:#fff
    style DEF fill:#C62828,color:#fff
    style NEU fill:#F57F17,color:#000

graph LR
    subgraph Data Sources
        AV["Alpha Vantage Premium<br/><i>$50/mo — 75 req/min</i><br/>Fundamentals, Earnings,<br/>Balance Sheet, Cash Flow"]
        YF["Yahoo Finance<br/><i>Free — 99.6% success</i><br/>Price Data"]
        GM["Google Gemini 2.5 Flash<br/><i>~$0.01-0.05/cycle</i><br/>Earnings Detection"]
    end

    subgraph Engine
        E["Screening Engine"]
    end

    AV -->|"primary"| E
    YF -->|"fallback"| E
    GM -->|"AI layer"| E

    style AV fill:#283593,color:#fff
    style YF fill:#5C6BC0,color:#fff
    style GM fill:#9C27B0,color:#fff

Features

6-Stage Screening Pipeline — From S&P 500 universe construction through momentum, quality, AI analysis, risk assessment, to final position sizing with capital allocation

Zero Filtering Failures — Graceful degradation with automatic fallback calculations for FCF, ROE, PB ratio, and EPS growth when primary data is unavailable

AI Earnings Detection — Google Gemini 2.5 Flash with search grounding identifies stocks with recent earnings announcements, avoiding post-earnings announcement drift (PEAD)

Market Regime Adaptation — Automatically shifts between BULL, DEFENSIVE, and NEUTRAL strategies based on market conditions, adjusting position limits, beta ranges, and sector tilts

3-Tier Testing Infrastructure — 176+ tests: zero-cost mocked unit tests, snapshot-based integration tests, and live API production validation — all under 10 seconds

Bi-Weekly Production Cycle — Day 1 screening, Day 5 revalidation with momentum checks, Day 10 performance reporting with eToro CSV import

Getting Started

Prerequisites

Python 3.11+
Alpha Vantage Premium API key ($50/month — primary data source)
Gemini API key (optional — for earnings detection)

Installation

git clone https://github.com/adigunners/stock_picker_bot.git
cd stock_picker_bot/backend

python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

cp .env.production .env
# Edit .env: add ALPHAVANTAGE_API_KEY and optionally GEMINI_API_KEY

alembic upgrade head

# Verify installation
TEST_MODE=true pytest
# Expected: 176+ tests passing

First Run

# Health check
python -m app.cli health

# Audit data quality across S&P 500
python -m app.cli audit fundamentals

# Run stock screening with $10,000 capital
python -m app.cli screen --capital 10000

# Day 5 momentum revalidation
python -m app.cli day5-check

# Import eToro trades
python -m app.cli import-etoro --file "eToro account summary.csv"

# Generate performance report
python -m app.cli report performance

Architecture & Design Decisions

Why a 6-Stage Pipeline?

Each stage is a deliberate filter that reduces the universe progressively:

Universe Construction — Start with S&P 500 (~500 stocks) for liquidity and data availability
Momentum Filtering — SMA crossovers, RSI, Rate of Change eliminate stocks without positive technical signals
Quality Screening — Fundamental completeness checks with sector-aware rules (Financials, REITs, Utilities have different requirements)
AI Analysis — Gemini 2.5 Flash detects recent earnings announcements to avoid PEAD (post-earnings announcement drift)
Risk Assessment — Volatility, beta, and Sharpe ratio scoring against market-regime-specific thresholds
Position Sizing — Portfolio optimization with capital allocation, respecting regime-specific position limits

Key Design Decisions

Alpha Vantage as primary source — Consistent, structured fundamental data across all endpoints; Yahoo Finance as price-only fallback
Sector-aware validation — Financials don't have FCF; REITs have different PB norms; Utilities have lower growth — hardcoded rules per GICS sector
Zero-cost testing — TEST_MODE mocks all API calls, enabling 176+ tests to run in <10s with no API spend
Snapshot-based integration testing — Real data captured once, replayed deterministically, so integration tests have zero ongoing API cost
SQLite over PostgreSQL — Single-user system; SQLite with SQLAlchemy ORM provides full relational capabilities without server overhead

For detailed architecture documentation, see docs/architecture/.

Data Quality & Fallback Strategy

The engine guarantees zero filtering failures through a layered fallback system:

Metric	Primary Source	Fallback Calculation	Sector Exception
Free Cash Flow	Alpha Vantage Cash Flow	Operating CF − CapEx	Financials: skip
ROE	Alpha Vantage Overview	Net Income / Shareholder Equity	—
PB Ratio	Alpha Vantage Overview	Market Cap / Book Value	REITs: relaxed threshold
EPS Growth	Alpha Vantage Earnings	QoQ earnings comparison	—
Price Data	Yahoo Finance (99.6%)	Stooq	—

Cache Strategy

Data Type	TTL	Rationale
Fundamentals	7 days	Quarterly reports; daily refresh unnecessary
Prices	48 hours	Need recent but not real-time
Metadata	30 days	Sector, name, exchange rarely change

Audit CLI

python -m app.cli audit fundamentals

Scans the entire S&P 500 universe and reports missing fields, stale cache entries, and sector-specific validation gaps.

Testing Philosophy — 3-Tier Infrastructure

Tier 1: Unit Tests (zero cost, <10 seconds)

TEST_MODE=true mocks all API calls (Gemini, Alpha Vantage, Yahoo Finance)
In-memory SQLite database
176+ tests covering all core logic
Run on every commit via GitHub Actions CI

Tier 2: Integration Tests (snapshot-based, zero ongoing cost)

Real data snapshots captured from live APIs (one-time cost)
Stored in data/snapshots/ for deterministic replay
Validates end-to-end screening workflows against known data
Cycles 5-9 preserved for backtesting

Tier 3: Production Validation (live API, cost-tracked)

Real API calls for final pre-deployment validation
API cost monitoring with per-provider budgets
Periodic smoke tests against live data

# Run all tests (zero cost)
TEST_MODE=true pytest

# Run with coverage report
pytest --cov=app --cov-report=html

# Run specific module
pytest tests/filtering/ -v
pytest tests/api/ -v
pytest tests/data_providers/ -v

Bi-Weekly Production Cycle

Day	Phase	Actions
Day 1	Screen	Data refresh → Run pipeline → Select positions → Execute trades
Day 5	Revalidate	Momentum check → Trim 50% if below 20-day SMA → Full exit if below 50-day SMA
Day 10	Report	Import eToro CSV → Generate performance report → Log cycle results

Pre-Flight Checklist (Day 1)

Alpha Vantage Premium API key active
Database initialized (alembic upgrade head)
All tests passing (TEST_MODE=true pytest)
S&P 500 universe current
Cache directory has space (500MB+)

For detailed workflows, see docs/workflows/biweekly-cycle.md and docs/workflows/preflight-checklist.md.

API Configuration & Costs

Alpha Vantage Premium (Primary)

Cost: $50/month
Rate Limit: 75 requests/minute, no daily limit
Endpoints: COMPANY_OVERVIEW, EARNINGS, BALANCE_SHEET, INCOME_STATEMENT, CASH_FLOW
Usage: All fundamental data, earnings history, financial statements

Yahoo Finance (Fallback)

Cost: Free
Rate Limit: Unofficial, ~99.6% success rate
Usage: Price data only (OHLCV)

Google Gemini (Optional)

Model: Gemini 2.5 Flash with search grounding
Cost: ~$0.01-0.05 per screening cycle
Usage: Real-time earnings announcement detection (3-day lookback)

Performance

Metric	Value
Full screening cycle	30-60 seconds
Test suite execution	<10 seconds
Cache footprint	~50MB
Database size	~500KB
Memory usage	<500MB

Tech Stack

Component	Technology	Purpose
Language	Python 3.11+	Core runtime
API Framework	FastAPI	REST endpoints
Database	SQLite + SQLAlchemy + Alembic	Storage, ORM, migrations
Data Processing	pandas, numpy	Numerical analysis
AI/LLM	Google Gemini 2.5 Flash	Earnings detection with search grounding
Primary Data	Alpha Vantage Premium	Fundamentals, earnings, financials
Price Data	Yahoo Finance	OHLCV price history
Testing	pytest (176+ tests)	3-tier test infrastructure
Code Quality	black, ruff, mypy	Formatting, linting, type checking
CI/CD	GitHub Actions	Automated formatting and lint checks
CSV Import	Custom eToro parser	EUR→USD conversion, date formatting

Project Structure

stock_picker_bot/
├── backend/
│   ├── app/
│   │   ├── api/              # FastAPI REST endpoints
│   │   ├── cli/              # CLI commands (screen, audit, health, day5-check)
│   │   ├── data/             # Data providers and validators
│   │   ├── database/         # SQLAlchemy models and repository
│   │   ├── engine/           # Screening engine
│   │   │   ├── calculations/ # Metric calculations
│   │   │   ├── filters/      # Pipeline filter stages
│   │   │   ├── scorers/      # Stock scoring logic
│   │   │   └── validators/   # Data validation
│   │   ├── llm/              # Gemini integration
│   │   ├── models/           # Data models
│   │   ├── parsers/          # eToro CSV parser
│   │   ├── services/         # Business logic
│   │   └── utils/            # Utilities
│   ├── tests/                # 44 test files
│   │   ├── api/              # Endpoint tests
│   │   ├── filtering/        # Filter and scorer tests
│   │   ├── data_providers/   # Data provider tests
│   │   ├── parsers/          # Parser tests
│   │   └── performance/      # Performance calculation tests
│   ├── alembic/              # Database migrations
│   └── config/               # Configuration
├── data/                     # gitignored
│   ├── db/                   # SQLite databases
│   ├── cache/                # API response cache (fundamentals/prices/metadata)
│   ├── snapshots/            # Test data snapshots
│   ├── exports/              # Screening results
│   └── imports/              # eToro CSV imports
├── docs/
│   ├── architecture/         # System design, decisions, engine overview
│   ├── workflows/            # Bi-weekly cycle, preflight, error recovery
│   ├── testing/              # Test execution, coverage, data strategy
│   ├── data-flow/            # Caching, enrichment, sector processing
│   ├── guides/               # Data management, enrichment quick start
│   ├── setup/                # Environment setup
│   └── performance/          # Benchmarks and feedback
└── README.md

Documentation Index

Category	Document	Description
Workflows	Bi-Weekly Cycle	Day 1/5/10 production workflow
	Pre-Flight Checklist	Pre-cycle validation
	Error Recovery	Common failures and fixes
Architecture	System Architecture	6-stage pipeline, regime detection
	Design Decisions	Key architectural rationale
	Engine Overview	Screening engine deep dive
Data	Multi-Source Enrichment	Data provider hierarchy
	Sector-Aware Processing	Sector-specific rules
	Caching Strategy	Cache TTLs and management
Testing	Test Execution Guide	Running tests
	Data Strategy	TEST_MODE, snapshots, cost tracking
	Coverage Guide	Coverage reports
Setup	Environment Setup	Development environment
	Data Management	Cache and data operations

Built with Claude Code as the primary development partner — architecture, implementation, and testing developed through an agentic AI methodology.

Version: v2.2 | MIT License — see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 99 Commits
.claude		.claude
.github/workflows		.github/workflows
backend		backend
data		data
docs		docs
.gitignore		.gitignore
.markdownlint-cli2.jsonc		.markdownlint-cli2.jsonc
.prettierrc.json		.prettierrc.json
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
run_release.sh		run_release.sh
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stock Picker Bot

Key Metrics

Pipeline Architecture

Features

Getting Started

Prerequisites

Installation

First Run

Why a 6-Stage Pipeline?

Key Design Decisions

Cache Strategy

Audit CLI

Tier 1: Unit Tests (zero cost, <10 seconds)

Tier 2: Integration Tests (snapshot-based, zero ongoing cost)

Tier 3: Production Validation (live API, cost-tracked)

Pre-Flight Checklist (Day 1)

Alpha Vantage Premium (Primary)

Yahoo Finance (Fallback)

Google Gemini (Optional)

Performance

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Stock Picker Bot

Key Metrics

Pipeline Architecture

Features

Getting Started

Prerequisites

Installation

First Run

Why a 6-Stage Pipeline?

Key Design Decisions

Cache Strategy

Audit CLI

Tier 1: Unit Tests (zero cost, <10 seconds)

Tier 2: Integration Tests (snapshot-based, zero ongoing cost)

Tier 3: Production Validation (live API, cost-tracked)

Pre-Flight Checklist (Day 1)

Alpha Vantage Premium (Primary)

Yahoo Finance (Fallback)

Google Gemini (Optional)

Performance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages