Skip to content

kamathhrishi/finance-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

80 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Finance Agent

Finance agent is an equity research platform. Ask questions and get answers from 10-K filings, earnings calls, and news.

Live Platform: www.stratalens.ai

10K filings agent blogpost: Blogpost

Agent System

Core agent system implementing Retrieval-Augmented Generation (RAG) with semantic data source routing, research planning, and iterative self-improvement for financial Q&A.

Architecture Overview

                              AGENT PIPELINE
 ═══════════════════════════════════════════════════════════════════════

 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
 β”‚ Question │───►│ Question Analyzer │───►│  Semantic Data Routing   β”‚
 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β”‚  (LLM via config) β”‚    β”‚                          β”‚
                 β”‚                   β”‚    β”‚  β€’ Earnings Transcripts  β”‚
                 β”‚ Extracts:         β”‚    β”‚  β€’ SEC 10-K Filings      β”‚
                 β”‚ β€’ Tickers         β”‚    β”‚  β€’ Real-Time News        β”‚
                 β”‚ β€’ Time periods    β”‚    β”‚  β€’ Hybrid (multi-source) β”‚
                 β”‚ β€’ Intent          β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜                 β”‚
                                                       β–Ό
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚              RESEARCH PLANNING                       β”‚
                 β”‚  Agent generates reasoning: "I need to find..."     β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                          β–Ό
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚                  RETRIEVAL LAYER                     β”‚
                 β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚
                 β”‚  β”‚  Earnings   β”‚  β”‚  SEC 10-K   β”‚  β”‚   Tavily    β”‚  β”‚
                 β”‚  β”‚ Transcripts β”‚  β”‚  Retrieval  β”‚  β”‚    News     β”‚  β”‚
                 β”‚  β”‚             β”‚  β”‚   Agent     β”‚  β”‚             β”‚  β”‚
                 β”‚  β”‚ Vector DB   β”‚  β”‚ (10-K only) β”‚  β”‚  Live API   β”‚  β”‚
                 β”‚  β”‚ + Hybrid    β”‚  β”‚ Planning +  β”‚  β”‚             β”‚  β”‚
                 β”‚  β”‚   Search    β”‚  β”‚  Iterative  β”‚  β”‚             β”‚  β”‚
                 β”‚  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                       β”‚ β–²
                                       β”‚ β”‚ Re-query with
                                       β”‚ β”‚ follow-up questions
                                       β–Ό β”‚
                 β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                 β”‚               ITERATIVE IMPROVEMENT                  β”‚
                 β”‚                                                      β”‚
                 β”‚    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
                 β”‚    β”‚ Generate │───►│ Evaluate │───►│ Iterate? │─────┼───┐
                 β”‚    β”‚  Answer  β”‚    β”‚ Quality  β”‚    β”‚          β”‚     β”‚   β”‚
                 β”‚    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚   β”‚
                 β”‚                                         β”‚ NO        β”‚   β”‚ YES
                 β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
                                                           β–Ό               β”‚
                                                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”        β”‚
                                                    β”‚   ANSWER    β”‚        β”‚
                                                    β”‚ + Citations β”‚        β”‚
                                                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜        β”‚
                                                           β–²               β”‚
                                                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Concepts:

  1. Semantic Routing - Routes to data sources based on question intent, not keywords
  2. Research Planning - Agent explains reasoning before searching ("I need to find...")
  3. Multi-Source RAG - Combines earnings transcripts, SEC 10-K filings, and news
  4. Self-Reflection - Evaluates answer quality and iterates until confident
  5. Answer Modes - Configurable iteration depth (2-10 iterations) and quality thresholds (70-95%)
  6. Search-Optimized Follow-ups - Generates keyword phrases for better RAG retrieval
  7. Parallel Multi-Agent Synthesis - Per-ticker subagents run in parallel; results are synthesized into one unified answer

Benchmark: 91% accuracy on FinanceBench (112 10-K questions), ~10s per question, evaluated using LLM-as-a-judge.

Documentation

Document Description
agent/README.md Complete agent architecture, pipeline stages, configuration
docs/SEC_AGENT.md SEC 10-K agent: section routing, table selection, reranking
agent/rag/data_ingestion/README.md Data ingestion pipelines for transcripts and 10-K filings

Features

  • Earnings Transcripts (2020-2025) - Word-for-word executive commentary from earnings calls
  • SEC 10-K Filings (2018-2025) - Official annual reports via specialized retrieval agent (10-Q/8-K coming soon)
  • Real-Time News - Latest market developments via Tavily search
  • Financial Screener - Natural language queries over company fundamentals [in development]

Unlike generic LLMs that rely on web content, Finance Agent uses the same authoritative documents that professional analysts depend on.

Tech Stack

  • Backend: FastAPI, PostgreSQL (pgvector), DuckDB
  • AI/ML: Cerebras (Qwen-3-235B), OpenAI (fallback), RAG with iterative self-improvement
  • Search: Hybrid vector (pgvector) + TF-IDF with cross-encoder reranking
  • Frontend: React + TypeScript, Tailwind CSS

Project Structure

finance_agent/
β”œβ”€β”€ agent/                  # AI agent & RAG system         β†’ see agent/README.md
β”‚   β”œβ”€β”€ __init__.py        # Public API: Agent, RAGAgent, create_agent()
β”‚   β”œβ”€β”€ agent_config.py    # Iteration/quality threshold settings
β”‚   β”œβ”€β”€ prompts.py         # Centralized LLM prompt templates
β”‚   β”œβ”€β”€ llm/               # Unified LLM client (OpenAI/Cerebras)  β†’ see agent/llm/README.md
β”‚   β”œβ”€β”€ rag/               # RAG implementation
β”‚   β”‚   β”œβ”€β”€ rag_agent.py                          # Main orchestration
β”‚   β”‚   β”œβ”€β”€ sec_filings_service_smart_parallel.py  # SEC 10-K agent
β”‚   β”‚   β”œβ”€β”€ response_generator.py   # LLM response & evaluation
β”‚   β”‚   β”œβ”€β”€ question_analyzer.py    # Semantic routing
β”‚   β”‚   β”œβ”€β”€ search_engine.py        # Hybrid transcript search
β”‚   β”‚   β”œβ”€β”€ tavily_service.py       # Real-time news
β”‚   β”‚   β”œβ”€β”€ earnings_transcript_service.py  # Dedicated earnings transcript retrieval agent
β”‚   β”‚   β”œβ”€β”€ search_planner.py       # Search plan generation and temporal reference resolution
β”‚   β”‚   β”œβ”€β”€ rag_flow_context.py     # Flow context dataclass for pipeline state
β”‚   β”‚   └── data_ingestion/         # Data pipeline β†’ see data_ingestion/README.md
β”‚   └── screener/          # Financial screener
β”œβ”€β”€ app/                   # FastAPI application
β”‚   β”œβ”€β”€ routers/           # API endpoints
β”‚   └── schemas/           # Pydantic models
β”œβ”€β”€ frontend/              # React + TypeScript frontend
β”œβ”€β”€ docs/                  # Documentation
β”‚   └── SEC_AGENT.md       # 10-K agent deep dive

Quick Start

Prerequisites

  • Python 3.9+
  • PostgreSQL 12+ with pgvector extension
  • See Requirements for full dependency list

Installation

# Clone repository
git clone https://github.com/kamathhrishi/stratalensai.git
cd finance_agent

# Install dependencies
pip install -r requirements.txt

# Setup environment variables
cp .env.example .env
# Edit .env with your API keys and database credentials

# Configure environment (see Configuration section below)

Configuration

Before running the application, configure the following in .env:

  • BASE_URL - Set to your server URL (e.g., http://localhost:8000 for local, your production URL for deployed)
  • RAG_DEBUG_MODE - Set to false for production, true for development debugging
  • AUTH_DISABLED - Set to true to bypass Clerk auth (dev only), false for production
  • CLERK_SECRET_KEY / CLERK_PUBLISHABLE_KEY - Required for production authentication (get from Clerk Dashboard)

Frontend env vars (read from root .env via envDir: '../' in vite.config.ts):

  • VITE_CLERK_PUBLISHABLE_KEY - Same value as CLERK_PUBLISHABLE_KEY (Vite requires VITE_ prefix)
  • VITE_API_BASE_URL - Leave empty for same-origin requests (default); set to an explicit URL only if backend is on a separate domain
# Ingest data (optional - see agent/rag/data_ingestion/README.md)
python agent/rag/data_ingestion/download_transcripts.py
python agent/rag/data_ingestion/ingest_with_structure.py --ticker AAPL --year-start 2020 --year-end 2025

# Run server
python -m uvicorn app.main:app --host 0.0.0.0 --port 8000

Access the application at http://localhost:8000

Requirements

API Keys

Service Environment Variable Required
OpenAI OPENAI_API_KEY Yes
Cerebras CEREBRAS_API_KEY Yes
API Ninjas API_NINJAS_KEY Yes
Clerk CLERK_SECRET_KEY, CLERK_PUBLISHABLE_KEY Yes (production)
Tavily TAVILY_API_KEY Optional
Logfire LOGFIRE_TOKEN Optional

Database

  • PostgreSQL with pgvector extension (DATABASE_URL)
  • Redis (optional, for caching) (REDIS_URL)

Python Dependencies

See requirements.txt for full list.

API Documentation

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Key Endpoints

  • POST /message/stream-v2 - Chat with streaming RAG responses
  • GET /companies/search - Search companies by ticker/name
  • GET /transcript/{ticker}/{year}/{quarter} - Get specific earnings transcript
  • POST /screener/query/stream - Natural language financial queries

Data Sources

Data is split between PostgreSQL (embeddings, metadata) and Railway S3 (full filing documents, transcript text). See agent/rag/data_ingestion/README.md for detailed ingestion instructions.

AI Agent Documentation

Document Description
agent/README.md Complete agent architecture, pipeline stages, semantic routing, iterative self-improvement
docs/SEC_AGENT.md SEC 10-K agent: planning-driven retrieval, 91% accuracy on FinanceBench
agent/rag/data_ingestion/README.md Data ingestion pipelines for transcripts and SEC filings

Development Status

Production (Finance Agent):

  • Earnings transcript chat with RAG
  • SEC 10-K filings (2018-2025)
  • Real-time streaming responses
  • User authentication

In Development:

  • Enhanced financial screener
  • Performance optimizations

Contributing

Contributions welcome! Please open an issue to discuss major changes before submitting PRs.

License

MIT License - see LICENSE file for details

Contact

For questions or access requests: hrishi@stratalens.ai

About

Ask questions and get answers from earnings calls, SEC filings and news

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors