Berkshire Hathaway Intelligence (Mastra RAG)

Retrieval‑Augmented Generation (RAG) system built with the Mastra framework that answers questions about Warren Buffett’s investment philosophy using Berkshire Hathaway shareholder letters. Includes ingestion, chunking, embedding, pgvector storage, a retrieval tool, and a GPT‑4o‑powered agent with persistent memory.

Features

Mastra Agent (berkshireAgent) powered by GPT‑4o
Document processing with MDocument and metadata enrichment (year, source)
Vector storage via pgvector with cosine distance search
Retrieval tool (berkshire_tool) that embeds queries and returns top matches with source/year/score

Repository layout

my-mastra-app/
  docker-compose.yml          # Postgres + pgvector
  ingestion/ingest-and-chunk.js
  embedder/embed.ts           # Creates index + upserts embeddings/metadata
  src/mastra/
    index.ts                  # Mastra app wiring (agent, memory, vectors)
    agents/berkshire-agent.ts
    tools/berkshire-tool.ts
  parsed/*.txt                # Raw shareholder letters (plain text)
  chunks/*.json               # Chunked documents with metadata

Prerequisites

Node.js 20.9+ (ESM)
Docker (for Postgres/pgvector)
OpenAI API key

Environment variables (set in your shell):

export OPENAI_API_KEY=YOUR_KEY

The default Postgres connection string is:

postgresql://postgres:postgres@localhost:5433/mastra_rag_db

It matches the provided docker-compose.yml. You can change it in src/mastra/index.ts if needed.

Quick start

Start Postgres with pgvector

cd my-mastra-app
docker compose up -d

Install dependencies

npm install

Ingest and chunk the shareholder letters (This can be skipped since the chunks are already ready, but for a diff. use case, need to update and run the script)

This script uses MDocument to chunk text and enrich metadata. Run it from its directory so relative paths resolve.

cd ingestion
node ingest-and-chunk.js
cd ..

Create embeddings and upsert into pgvector

The embedder is written in TypeScript. Use tsx to run it directly:

npm i -D tsx
npx tsx embedder/embed.ts

This will:

Create the searchexamples index (1536 dims)
Generate OpenAI embeddings for each chunk
Upsert vectors + metadata into pgvector

Run Mastra locally (optional, for playground/dev)

npm run dev

How it works

Ingestion and chunking (ingestion/ingest-and-chunk.js)

Reads parsed/*.txt
Uses MDocument.fromText(...).chunk({ strategy: 'recursive', size: 512, overlap: 50 })
Adds metadata: source, year
Writes chunks/*.json

Embedding and storage (embedder/embed.ts)

Creates pgvector index searchexamples
Embeds each chunk with text-embedding-3-small
Upserts vectors + metadata into the berkshire_intelligence schema

Retrieval tool (src/mastra/tools/berkshire-tool.ts)

Embeds user query, runs similarity search via <=> against pgvector
Returns text, source, year, score

Agent (src/mastra/agents/berkshire-agent.ts)

GPT‑4o model, wired with the retrieval tool
Persistent memory via Memory + PostgresStore for conversation continuity

App wiring (src/mastra/index.ts)

Registers agent, storage, and vector store (PgVector)

Configuration

OpenAI: set OPENAI_API_KEY
Postgres/pgvector: default connection string is hard-coded in src/mastra/index.ts. Update if your DB differs.

Testing checklist

Document processing
- node ingestion/ingest-and-chunk.js creates chunks/*.json
- npx tsx embedder/embed.ts creates index and upserts embeddings
- Vector search returns relevant rows (validated by the tool)
Agent & memory
- Agent responds with grounded answers (tool results used)
- Conversation context persists across turns (Postgres store)
Retrieval & citations
- Tool returns text, source, year, score
- Agent instructions encourage quoting + citation by year/source

Application Preview

Here is a preview of the application in action, showing the agent retrieving information and providing source-based answers.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
my-mastra-app		my-mastra-app
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Berkshire Hathaway Intelligence (Mastra RAG)

Features

Repository layout

Prerequisites

Quick start

How it works

Configuration

Testing checklist

Application Preview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Berkshire Hathaway Intelligence (Mastra RAG)

Features

Repository layout

Prerequisites

Quick start

How it works

Configuration

Testing checklist

Application Preview

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages