Insomniac Hedge Fund Guy — AI Engineer Assessment

Lightweight, end-to-end AI workflow in Node.js/TypeScript implementing:

Layer 1 – Personalized Chatbot (edgy market-pro tone, collects Name/Email/Income naturally)
Layer 2 – RAG from a single local file using embeddings + similarity search
Layer 3 – Data Storage as structured JSONL (session-scoped)
Layer 4 – Structured Output Delivery via email (SMTP/Nodemailer) and/or webhook (e.g., Google Apps Script/Make/Zapier)

Zero heavy infra. Single command to run. Swap any component (LLM, embeddings, delivery) without touching core logic.

Quick Start

# 1) Clone + install
npm i

# 2) Configure environment
cp .env.example .env
# Edit .env with your keys (OPENAI_API_KEY, SMTP creds, etc.)

# 3) Prepare knowledge base file
# A sample file is already at ./kb/source.txt (your RAG corpus)
# You can replace it with your own content

# 4) Run dev server
npm run dev

# Server runs at http://localhost:8787
# Open browser and start chatting!

Stack

Runtime: Node 20, TypeScript
Server: Express
LLM: OpenAI Chat Completions (swap-able)
Embeddings: OpenAI Embeddings (swap-able); in-memory vector store
RAG: Cosine similarity search on embedded chunks
Storage: JSONL (./data/users.jsonl) with ISO timestamps + sessionId
Delivery: Nodemailer SMTP (email) and generic POST webhook
UI: Minimal Vite + vanilla TS chat client (single page)

Project Structure

.
├─ src/
│  ├─ server.ts            # Express app, routes, SSE for streaming
│  ├─ prompt.ts            # System + style guardrails (edgy market-pro)
│  ├─ rag.ts               # Index/load kb file, embed, similarity search
│  ├─ storage.ts           # JSONL append + session helpers
│  ├─ deliver.ts           # Email + webhook delivery
│  ├─ types.ts             # Shared interfaces
│  └─ util.ts              # Small helpers
├─ web/
│  ├─ index.html           # Minimal chat UI
│  ├─ main.ts              # Fetch/SSE client, session mgmt
│  └─ vite.config.ts       # Vite dev server config
├─ kb/
│  └─ source.txt           # Single RAG corpus file (replaceable)
├─ data/
│  └─ users.jsonl          # Structured storage (created at runtime)
├─ .env.example            # Environment variables template
├─ package.json
├─ tsconfig.json
└─ README.md

Environment Variables

Create a .env file based on .env.example:

# OpenAI (Required)
OPENAI_API_KEY="sk-..."
OPENAI_MODEL="gpt-4o-mini"
EMBEDDINGS_MODEL="text-embedding-3-small"

# SMTP (Optional - for email delivery)
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_SECURE=false
SMTP_USER="bot@example.com"
SMTP_PASS="your-app-password"
SMTP_FROM="Insomniac HF Bot <bot@example.com>"
SMTP_TO="recipient@example.com"

# Webhook (Optional - for webhook delivery)
WEBHOOK_URL="https://script.google.com/macros/s/.../exec"

# Server
PORT=8787

Getting API Keys

OpenAI API Key: Get from platform.openai.com
Gmail App Password: Google Account Settings → Security → 2-Step Verification → App Passwords
Webhook URL: Use Google Apps Script, Make.com, Zapier, or any service accepting JSON POST

Features Walkthrough

Layer 1: Personalized Chatbot

The assistant has a distinct personality — a sharp, no-nonsense hedge fund analyst. It naturally collects three pieces of user data over the conversation:

Name — "By the way, what should I call you?"
Email — "Mind dropping your email? I can send you some research notes."
Income Range — "Ballpark annual income? Helps me tailor recommendations."

The bot never feels like a form — it weaves questions naturally into market discussions.

Guardrails: Every first response includes: "🚨 DYOR (Do Your Own Research) — Nothing here is financial advice."

Layer 2: RAG (Retrieval-Augmented Generation)

Knowledge base: Single file at ./kb/source.txt
On startup, the file is chunked (500 chars, 50 char overlap) and embedded using OpenAI's text-embedding-3-small
Every user query triggers similarity search → top 3 relevant chunks are injected into the LLM context as "desk notes"
The assistant synthesizes these notes with its personality — it doesn't just regurgitate

Try it: Ask "What's your take on energy stocks?" and watch it reference the embedded knowledge base.

Layer 3: Data Storage

Every collected field (name, email, income) is tracked in the session
Once all three are collected, a structured record is saved to ./data/users.jsonl
Each line is a JSON object with:
- sessionId (UUID)
- timestamp (ISO 8601)
- name, email, income
- conversationHistory (first 10 messages for context)

Example record:

{
  "sessionId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "timestamp": "2024-11-06T14:32:18.123Z",
  "name": "Alex",
  "email": "alex@example.com",
  "income": "$100k-$250k",
  "conversationHistory": [...]
}

Layer 4: Structured Delivery

Hit the "Deliver Data" button (enabled once all 3 fields are collected) to send the data via:

Email (SMTP via Nodemailer)
- Sends formatted HTML email with user data
- Includes full JSON payload
Webhook (Generic POST)
- Sends JSON payload to any webhook URL
- Perfect for Google Sheets, Make.com, Zapier, n8n, etc.

Both delivery methods are optional — configure whichever you need in .env.

API Endpoints

`POST /api/session`

Creates a new chat session.

Response:

{
  "sessionId": "uuid-here"
}

`POST /api/chat`

Send a message and receive streaming response via Server-Sent Events (SSE).

Request:

{
  "sessionId": "uuid-here",
  "message": "What's your take on tech stocks?"
}

Response: SSE stream with chunks:

data: {"content":"Tech"}
data: {"content":" stocks"}
data: {"content":" are"}
...
data: [DONE]

`GET /api/session/:sessionId`

Get session data and collection status.

Response:

{
  "sessionId": "uuid-here",
  "userData": {
    "name": "Alex",
    "email": "alex@example.com",
    "income": "$100k-$250k"
  },
  "messageCount": 8
}

`POST /api/deliver`

Trigger email/webhook delivery of collected data.

Request:

{
  "sessionId": "uuid-here"
}

Response:

{
  "success": true,
  "result": {
    "email": { "success": true, "messageId": "..." },
    "webhook": { "success": true, "status": 200 }
  }
}

Demo Flow for Presentation

Show the knowledge base
- Open kb/source.txt — explain it's the single RAG corpus
Start a conversation
- Navigate to http://localhost:8787
- Ask: "What should I invest in if I make around $120k a year?"
Watch the personality
- Notice the edgy, direct tone
- Observe the DYOR disclaimer in first message
See RAG in action
- Ask: "Tell me about your energy stock thesis"
- The assistant pulls from desk notes and synthesizes
Natural data collection
- Over 2-3 messages, it'll ask for name, email, income
- It never feels like a form — always conversational
Check data collection
- Watch the status bar: "Collected: name, email, income"
- When all 3 are collected: "✅ All data collected! Ready to deliver."
Deliver the data
- Click "Deliver Data" button
- Show email received / webhook log / Google Sheet row
Inspect storage
- Open data/users.jsonl — show structured record

Customization

Swap the LLM

Edit src/server.ts:

const OPENAI_MODEL = "gpt-4o-mini"; // Change to gpt-4, claude, etc.

For non-OpenAI models, replace the openai.chat.completions.create() call with your provider's SDK.

Swap Embeddings

Edit src/rag.ts:

const EMBEDDINGS_MODEL = "text-embedding-3-small";

Or use a local embedding model (Sentence Transformers, etc.).

Change Personality

Edit src/prompt.ts — the SYSTEM_PROMPT defines the entire persona.

Add More Fields

Edit src/types.ts to add fields to UserData, then update extraction logic in src/server.ts (extractUserData function).

Change Storage

Replace JSONL with SQLite, Postgres, or MongoDB by modifying src/storage.ts.

Development

# Install dependencies
npm install

# Run in development (auto-restart on changes)
npm run dev

# Type check (no compilation)
npm run type-check

# Build for production
npm run build

# Run production build
npm start

Production Deployment

Build the project:
```
npm run build
```
Set environment variables on your hosting platform
Run the server:
```
npm start
```

Deployment Options

Railway / Render / Fly.io: One-click deploy from GitHub
AWS EC2 / DigitalOcean: Traditional VM deployment
Vercel / Netlify: Need to adapt for serverless (replace Express with API routes)
Docker: Add a Dockerfile (Node 20 base, copy files, run npm start)

Troubleshooting

RAG not working?

Check that kb/source.txt exists and has content
Look for console log: ✅ RAG initialized with X embedded chunks
If file missing: ⚠️ Knowledge base file not found

Email not sending?

Verify SMTP credentials in .env
For Gmail, use App Password, not your regular password
Check firewall/port access (port 587 for SMTP)

Webhook not working?

Test webhook URL with curl: curl -X POST <URL> -H "Content-Type: application/json" -d '{"test":"data"}'
Check webhook service logs (Google Apps Script, Make.com, etc.)

Data not being collected?

The extraction logic uses simple pattern matching
Check console logs for 📝 Captured name, 📧 Captured email, 💰 Captured income
If patterns don't match, tweak regex in src/server.ts → extractUserData()

Architecture Decisions

Why in-memory sessions?

For demo simplicity. In production, use Redis or a database.

Why JSONL instead of database?

Shows minimal dependency approach. Easy to parse, human-readable, append-only safe.

Why SSE instead of WebSockets?

Simpler for uni-directional streaming. Fewer moving parts.

Why single-file RAG?

Matches assessment requirements. Easy to test/demo. Real systems use vector DBs (Pinecone, Weaviate, etc.).

License

MIT — feel free to use, modify, and build upon this project.

Questions?

This project demonstrates:

✅ LLM integration with streaming responses
✅ Personality-driven prompt engineering
✅ RAG implementation (embeddings + similarity search)
✅ Natural language data extraction
✅ Structured data storage
✅ Multi-channel delivery (email + webhook)
✅ Production-ready TypeScript patterns
✅ Clean separation of concerns

Built for the Insomniac Hedge Fund AI Engineer Assessment. 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
kb		kb
src		src
web		web
.gitignore		.gitignore
DEMO_SCRIPT.md		DEMO_SCRIPT.md
PROJECT_OVERVIEW.md		PROJECT_OVERVIEW.md
README.md		README.md
SETUP.md		SETUP.md
WHATS_NEW_EMAILJS.md		WHATS_NEW_EMAILJS.md
env-template.txt		env-template.txt
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Folders and files

Latest commit

History

Repository files navigation

Insomniac Hedge Fund Guy — AI Engineer Assessment

Quick Start

Stack

Project Structure

Environment Variables

Getting API Keys

Features Walkthrough

Layer 1: Personalized Chatbot

Layer 2: RAG (Retrieval-Augmented Generation)

Layer 3: Data Storage

Layer 4: Structured Delivery

API Endpoints

POST /api/session

POST /api/chat

GET /api/session/:sessionId

POST /api/deliver

Demo Flow for Presentation

Customization

Swap the LLM

Swap Embeddings

Change Personality

Add More Fields

Change Storage

Development

Production Deployment

Deployment Options

Troubleshooting

RAG not working?

Email not sending?

Webhook not working?

Data not being collected?

Architecture Decisions

Why in-memory sessions?

Why JSONL instead of database?

Why SSE instead of WebSockets?

Why single-file RAG?

License

Questions?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/session`

`POST /api/chat`

`GET /api/session/:sessionId`

`POST /api/deliver`

Packages