An AI-powered app that analyzes any financial document (10-K filings, annual reports, quarterly reports, etc.) from any jurisdiction and provides insights based on your selected criteria.
You'll need accounts and API keys for:
- Groq - Get free API key (for LLM)
- Google AI - Get free API key (for embeddings)
- Pinecone - Sign up free
- Clerk - Sign up free
- PostgreSQL - Use Vercel Postgres or Neon
- Vercel Blob - Vercel Dashboard → Storage (for document uploads)
- LangSmith (optional) - smith.langchain.com (for tracing/debugging RAG and agents)
npm install --legacy-peer-depscp .env.example .env.localOpen .env.local and fill in your keys:
# Required
GROQ_API_KEY=...
GOOGLE_API_KEY=...
PINECONE_API_KEY=...
PINECONE_INDEX_NAME=investment-rag
POSTGRES_URL=postgresql://...
NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY=pk_...
CLERK_SECRET_KEY=sk_...
BLOB_READ_WRITE_TOKEN=...
# Optional (for webhook sync)
CLERK_WEBHOOK_SECRET=whsec_...
LANGCHAIN_API_KEY=...The app stores uploaded PDFs in Vercel Blob. You need a Blob store and BLOB_READ_WRITE_TOKEN:
- Go to Vercel Dashboard → select your project (or create one) → Storage tab.
- Click Create Database → choose Blob.
- Name the store (e.g.
investment-rag-blob), set access to Public (so document URLs work), then create. - After creation, Vercel adds
BLOB_READ_WRITE_TOKENto the project. For local dev, pull env vars:Or copy the token from Storage → your Blob store → Settings and setvercel link # link this repo to your Vercel project if needed vercel env pull .env.localBLOB_READ_WRITE_TOKENin.env.local.
Without this token, document uploads will fail.
LangSmith provides tracing and debugging for the RAG pipeline and analysis agent (LangGraph). Useful for development, not required to run the app.
- Sign up at smith.langchain.com.
- Go to Settings → API Keys → Create API Key.
- Copy the key and set in
.env.local:LANGCHAIN_API_KEY=lsv2_...
- Optionally set
LANGCHAIN_TRACING_V2=trueto enable tracing (LangChain SDK will send traces to LangSmith when the key is present).
You can leave LANGCHAIN_API_KEY unset for a basic demo.
npm run db:pushnpm run init:pineconenpm run devOpen http://localhost:3000 in your browser.
- Push your code to GitHub
- Go to vercel.com/new
- Import your repository
- Add all environment variables from
.env.local - Deploy
npm i -g vercel
vercel login
vercel --prod-
Set up Clerk Webhook: In Clerk Dashboard, add webhook endpoint
https://your-domain.com/api/webhooks/clerkwith events:user.created,user.updated,user.deleted -
Verify: Test document upload and analysis
User uploads PDF → Parse & Chunk → Assign Categories → Generate Embeddings → Store
↓
User runs analysis ← LLM generates verdict ← Filter by Categories ← Retrieve chunks
The app analyzes any financial report, including:
- SEC Filings: 10-K, 10-Q, 8-K (US)
- Annual Reports: From any jurisdiction (India, UK, EU, etc.)
- Quarterly Reports: Any format
- Other Financial Documents: Investor presentations, earnings reports
Each chunk is automatically classified into one or more categories using keyword pattern matching:
| Category | What It Captures |
|---|---|
financial-performance |
Revenue, profit, margins, cash flow, balance sheet data |
risk-factors |
Business risks, uncertainties, threats, exposures |
business-operations |
Products, services, market position, operations |
management-governance |
Leadership, board, compensation, governance practices |
legal-regulatory |
Legal proceedings, compliance, regulations, patents |
strategy-outlook |
Growth plans, acquisitions, R&D, future initiatives |
general |
Content that doesn't fit specific categories |
Categories enable pre-filtering during retrieval—when analyzing financial health, the system prioritizes financial-performance chunks; for risk assessment, it prioritizes risk-factors chunks.
When you upload a PDF:
- Parse: Extract text from PDF using
pdf-parse - Detect Headings: Find document structure (any format, not 10-K specific)
- Chunk: Split into ~1500 token pieces with heading-aware boundaries
- Classify: Assign categories to each chunk using keyword patterns
- Embed: Convert chunks to 768-dimension vectors using Gemini embeddings
- Store: Save vectors + categories in Pinecone, full data in PostgreSQL
| PostgreSQL | Pinecone |
|---|---|
| Stores structured data (users, documents, analysis results) | Stores vector embeddings |
| Good for complex queries & relationships | Optimized for fast similarity search |
| Source of truth for chunk text and categories | Finds semantically similar content |
Example: When searching, Pinecone finds chunks that mean the same thing as your query (even without exact keyword matches), then filters by category. PostgreSQL stores the full text and metadata.
When analyzing a document:
- Hybrid Search: Combines vector similarity + keyword matching
- Category Filtering: Pre-filters chunks by relevant categories
- LLM Analysis: Groq Llama 3.3 70B analyzes chunks against your criteria
The analysis runs as a 3-step workflow:
Retrieve → Analyze → Synthesize
- Retrieve: Get relevant chunks (filtered by category)
- Analyze: LLM extracts insights per criterion
- Synthesize: Combine into final verdict with confidence score
The system evaluates documents against these criteria:
| Criterion | Categories Used |
|---|---|
| Financial Health | financial-performance |
| Risk Assessment | risk-factors, legal-regulatory |
| Growth Potential | strategy-outlook, business-operations |
| Competitive Position | business-operations, strategy-outlook |
| Management Quality | management-governance |
| Regulatory Compliance | legal-regulatory, risk-factors |
app/ # Next.js pages & API routes
├── (auth)/ # Sign in/up pages (Clerk)
├── (dashboard)/ # Protected pages (dashboard, documents, analysis)
└── api/ # Backend endpoints
lib/
├── agents/ # LangGraph analysis workflow
│ └── nodes/ # Retrieve, analyze, synthesize nodes
├── db/ # Database schema (Drizzle ORM)
├── rag/
│ ├── chunking/ # Heading-aware document splitting
│ ├── embeddings/ # Gemini embedding generation
│ ├── metadata/ # Category classifier (keyword-based)
│ └── retrieval/ # Hybrid search with category filtering
├── parsers/ # PDF parsing, heading detection
├── services/ # Document processor, retrieval service
└── vectorstore/ # Pinecone operations
components/ # React UI components
config/
├── criteria.config.ts # Analysis criteria with category mappings
└── rag.config.ts # Chunking, embedding, retrieval settings
| Component | Cost |
|---|---|
| Embeddings (Gemini, one-time per doc) | FREE ✨ |
| Analysis (Groq Llama 3.3 70B) | FREE ✨ |
| Total | $0.00 🎉 |
100% free within generous tier limits (1000s of requests/day)
| Command | Description |
|---|---|
npm run dev |
Start development server |
npm run build |
Build for production |
npm run db:push |
Push schema to database |
npm run db:studio |
Open Drizzle Studio (DB viewer) |
npm run init:pinecone |
Create Pinecone index |
- Frontend: Next.js 15, React 19, TailwindCSS, shadcn/ui
- Auth: Clerk
- Database: PostgreSQL (Drizzle ORM)
- Vector DB: Pinecone
- AI: LangChain, LangGraph, Groq (Llama 3.3 70B), Google Gemini (embeddings)
- Deployment: Vercel
Dependencies won't install?
npm install --legacy-peer-depsDatabase connection error?
- Check
POSTGRES_URLis correct - Ensure your IP is whitelisted if using external DB
Document stuck processing?
- Check terminal logs for errors
- Verify API key has credits
Analysis fails?
- Ensure document finished processing first
- Check API limits
MIT