RagZzy — Retrieval‑Augmented Support Assistant

RagZzy is a modern, retrieval‑augmented support assistant that blends semantic + keyword retrieval, dynamic few‑shot prompting, and a post‑generation validator with an optional single‑revision loop. It’s built to demonstrate strong product thinking, pragmatic engineering, and measurable quality.

VISIT HERE: RagZzy

Why This Is Interesting

Context-aware RAG: hybrid retrieval combining semantic embeddings and BM25‑ish keyword search with contextual and entity boosts.
Dynamic few-shot: automatically selects the most relevant support exemplars from curated seeds to steer tone and structure.
Validator with revision: enforces response quality using a checklist and performs a single self‑revision when needed.
Offline eval CLI: run local evaluations against seeds, report structure adherence and coverage metrics, export JSONL for analysis.

This project showcases: retrieval architectures, prompt engineering, safety/consistency checks, async streaming, and pragmatic tooling for evaluation.

Features

Hybrid Retrieval with Sentence Packing

Semantic retrieval (Gemini embeddings) + MiniSearch keyword search.
Contextual and entity alignment boosts from prior conversation.
Sentence-level packing under a token budget for high signal density.

Senior Support Persona

Persona block with principles, tone, and structured format guidance.
Optional enablement via SUPPORT_PERSONA or per-request override.
Consistency without over-constraining domain-specific content.

Dynamic Few‑Shot Selection

Seeds defined in scripts/seedSupportPersona.js (id, user, assistant, tags, difficulty).
Embeds user query and seeds, ranks by cosine similarity, selects top‑K.
Injects only the best examples into the prompt to stay within budget.

Post‑Generation Validator + Single Revision

Checklist-driven validation (sections: Summary, Steps, Validation, Rollback, Notes).
Strict JSON verdict parse with fallback for robustness.
If violations found, performs one revision pass to satisfy the checklist.

Offline Eval CLI

Run seeds through the pipeline, compute metrics, and export JSONL.
Metrics include structure adherence, length tokens, tag coverage, and retrieval stats.
Configurable flags for K, similarity threshold, persona enablement, and output.

Architecture Overview

API entry: api.chat.module.exports() handles POST with optional streaming.
Query analysis: intent classification, entity extraction, follow‑up detection, light query expansion.
Retrieval: api.chat.performContextAwareRetrieval() runs semantic + keyword + contextual search, then packs sentences under a token budget.
Prompt building: api.chat.buildDynamicPrompt() assembles persona, dynamic few‑shots, conversation history, retrieved context, and instructions.
Generation: Gemini 2.0 Flash (configurable), streaming or non‑streaming.
Validation: api.chat.runValidatorAndMaybeRevise() enforces checklist and revises once if needed.
Caching: embedding caches for knowledge chunks and few‑shot seeds.

Key files:

api.chat.generateContextAwareResponse(): non‑streaming generation + validator pass.
api.chat.buildDynamicPrompt(): persona + dynamic few‑shot + context assembly.
api.persona.getSupportValidatorChecklist(): checklist provider.
scripts/seedSupportPersona.js: curated support examples used for dynamic few‑shot.
scripts/eval.js: offline evaluator.

Setup

Requirements:

Node.js 18+ recommended.
A Google AI Studio API key (GEMINI_API_KEY).

Install:

npm install

Environment:

Export your Gemini API key:
- macOS/Linux: export GEMINI_API_KEY=YOUR_KEY
- Windows (PowerShell): setx GEMINI_API_KEY "YOUR_KEY" (restart terminal)
Optional: enable persona globally
- export SUPPORT_PERSONA=1

Running Locally

Non‑streaming POST endpoint is at api.chat.module.exports(). This project targets serverless adapters; for local usage, wire your server or simulate via scripts.

Frontend example client is in public/script.js (streams SSE tokens).

Dynamic Few‑Shot Configuration

In api/chat.js:

fewshot.enable: boolean (default true)
fewshot.k: number of examples (env FS_K supported)
fewshot.maxTokens: budget for few‑shot block
fewshot.minSimilarity: filter threshold

Seeds live in scripts/seedSupportPersona.js. The selector:

Embeds the user query and seed concatenation (user + assistant).
Scores by cosine similarity and picks top‑K (fallback to top‑K if all below threshold).
Trims under token budget.

Validator and Single‑Revision

Checklist source:

api.persona.getSupportValidatorChecklist()

Flow:

Model returns JSON verdict { ok, missing, notes }.
If ok=false, a single revision pass runs with persona + checklist to fix omissions while preserving correct content.
Validation meta is attached to the response.

Offline Evaluation

CLI: scripts/eval.js

Prerequisites:

export GEMINI_API_KEY=YOUR_KEY

Quick start:

npm run eval
node scripts/eval.js --limit 10
With persona on: SUPPORT_PERSONA=1 node scripts/eval.js --persona --limit 10

Useful flags:

--limit N: number of seeds
--k K: few‑shot top‑K (or set FS_K)
--minSim 0.2: minimum similarity threshold
--persona: force-enable senior support persona for the run
--no-validate: bypass validator pass
--out eval.jsonl: write per‑seed JSONL

NPM scripts (see package.json):

npm run eval
npm run eval:persona
npm run eval:k5
npm run eval:out

Outputs:

Per‑seed JSON (JSONL when --out used) containing:
- retrieval stats: bestSimilarity, averageScore, chunk scores
- analysis: intent, entities
- response: text, lengthTokens, structureOk, missing sections, tag coverage
- validation: revised or not, verdict payload
Summary:
- total, structureOkRatePct, avgResponseTokens, avgRetrievalBestSimilarity, avgTagCoverage

Deployment Notes

Set GEMINI_API_KEY in your hosting environment.
Optionally set SUPPORT_PERSONA=1 to enable persona globally; clients can override per‑request.
The code handles streaming (SSE) and non‑streaming responses. Ensure your platform passes through SSE headers.

Design Decisions and Trade‑offs

Gemini for both embeddings and generation: fewer dependencies, consistent embeddings, single‑vendor simplicity.
Hybrid retrieval: semantic (broad recall) plus MiniSearch (exact/rare terms).
Sentence packing: improves information density under prompt budgets.
Single revision only: avoids loops; predictable latency and cost.
Dynamic few‑shot only: removed static few‑shot duplication to save tokens and reduce drift.

Roadmap Ideas

Add golden answer assertions to eval (precision/recall with pattern banks).
Broaden safety checks with a full moderation pass and red‑team seeds.
Export more internals for reuse (e.g., prompt builder) to avoid duplication in CLI.
Persist seed/embedding caches across runs (on-disk).
Expand personas and per‑intent response styles.

License

Not Licensed for use

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
api		api
public		public
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
knowledge_base.txt		knowledge_base.txt
package-lock.json		package-lock.json
package.json		package.json
scalability-infrastructure-design.md		scalability-infrastructure-design.md
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RagZzy — Retrieval‑Augmented Support Assistant

Why This Is Interesting

Features

Architecture Overview

Setup

Running Locally

Dynamic Few‑Shot Configuration

Validator and Single‑Revision

Offline Evaluation

Deployment Notes

Design Decisions and Trade‑offs

Roadmap Ideas

License

About

Uh oh!

Releases 2

Packages

Languages

cjanowski/ragzzy

Folders and files

Latest commit

History

Repository files navigation

RagZzy — Retrieval‑Augmented Support Assistant

Why This Is Interesting

Features

Architecture Overview

Setup

Running Locally

Dynamic Few‑Shot Configuration

Validator and Single‑Revision

Offline Evaluation

Deployment Notes

Design Decisions and Trade‑offs

Roadmap Ideas

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages