A modern, streaming RAG chat for Java learners, grounded in Java 24/25 documentation with precise citations. Backend-only (Spring WebFlux + Spring AI + Qdrant). Uses OpenAI API with local embeddings (LM Studio) and Qdrant Cloud for vector storage.
- Complete Documentation Coverage: Successfully ingested 22,756+ documents from Java 24/25 and Spring ecosystem
- Local Embeddings: Integrated with LM Studio using text-embedding-qwen3-embedding-8b model (4096 dimensions)
- Qdrant Cloud Integration: Connected to cloud-hosted vector database with 22,756+ indexed vectors
- Consolidated Pipeline: Single-command fetch and process pipeline with SHA-256 hash-based deduplication
- Smart Deduplication: Prevents redundant processing and re-uploading of documents
- Comprehensive Documentation: Java 24 (10,743 files), Java 25 (10,510 files), Spring AI (218 files)
- Dual-Mode UI (Chat + Guided Learning): Tabbed shell (
/) loads isolated Chat (/chat.html) and new Guided Learning (/guided.html) - Guided Learning (Think Java): Curated lessons powered by the “Think Java — 2nd Edition” PDF with lesson-scoped chat, citations, and enrichment
# Complete setup: fetch all docs and process to Qdrant
make full-pipelineThis single command will:
- Fetch all Java 24/25/EA and Spring documentation (skips existing)
- Process documents with embeddings
- Upload to Qdrant with deduplication
- Start the application on port 8085
# 1) Set env vars (example - use your real values)
# Create a .env file in repo root (see .env.example for all options):
#
# Authentication - You can use one or both:
#
# GitHub Models (free tier available):
# GITHUB_TOKEN=your_github_personal_access_token
#
# OpenAI API (separate, independent):
# OPENAI_API_KEY=sk-xxx
#
# How the app uses these:
# 1. Spring AI tries GITHUB_TOKEN first, then OPENAI_API_KEY
# 2. On auth failure, fallback tries direct OpenAI or GitHub Models
#
# Optional: Local embeddings (if using LM Studio)
# APP_LOCAL_EMBEDDING_ENABLED=true
# LOCAL_EMBEDDING_SERVER_URL=http://127.0.0.1:8088
# APP_LOCAL_EMBEDDING_MODEL=text-embedding-qwen3-embedding-8b
# APP_LOCAL_EMBEDDING_DIMENSIONS=4096 # Note: 4096 for qwen3-embedding-8b
#
# Optional: Qdrant Cloud (for vector storage)
# QDRANT_HOST=xxx.us-west-1-0.aws.cloud.qdrant.io
# QDRANT_PORT=8086
# QDRANT_SSL=true
# QDRANT_API_KEY=your-qdrant-api-key
# QDRANT_COLLECTION=java-chat
# 2) Fetch documentation (checks for existing)
make fetch-all
# 3) Process and run (auto-processes new docs)
make runHealth check: GET http://localhost:8085/actuator/health Embeddings health: GET http://localhost:8085/api/chat/health/embeddings
The app now provides two complementary modes with a shared learning UX and formatting pipeline:
-
Chat (free-form):
- Location:
/chat.html(also accessible via the “Chat” tab at/) - Features: SSE streaming, server-side markdown, inline enrichments ({{hint}}, {{reminder}}, {{background}}, {{warning}}, {{example}}), citation pills, Prism highlighting, copy/export.
- APIs used:
/api/chat/stream,/api/chat/citations,/api/markdown/render,/api/chat/enrich(alias:/api/enrich).
- Location:
-
Guided Learning (curated):
- Location:
/guided.html(also accessible via the “Guided Learning” tab at/) - Content scope: “Think Java — 2nd Edition” PDF (mapped to
/pdfs/Think Java - 2nd Edition Book.pdf) - Features: lesson selector (TOC), lesson summary, book-scoped citations, enrichment cards, and an embedded chat scoped to the selected lesson.
- APIs used:
/api/guided/toc,/api/guided/lesson,/api/guided/citations,/api/guided/enrich,/api/guided/stream.
- Location:
Frontend structure:
static/index.html: tab shell only (a11y tabs + iframe loader for pages).static/chat.html: isolated Chat UI (migrated from originalindex.html).static/guided.html: Guided Learning UI.
Guided Learning backend:
GET /api/guided/toc→ curated lessons (fromsrc/main/resources/guided/toc.json).GET /api/guided/lesson?slug=...→ lesson metadata.GET /api/guided/citations?slug=...→ citations restricted to Think Java.GET /api/guided/enrich?slug=...→ hints/background/reminders based on Think Java snippets.POST /api/guided/stream(SSE) → lesson-scoped chat. Body:{ "sessionId":"guided:<slug>", "slug":"<slug>", "latest":"question" }.
All rendering quality is consistent across both modes: server-side markdown via MarkdownService preserves enrichment markers which the client rehydrates into styled blocks; spacing for paragraphs, lists, and code follows the same rules; Prism handles code highlighting.
Common workflows are scripted via Makefile:
# Discover commands
make help
# Build / Test
make build
make test
# Run packaged jar
make run
# Live dev (Spring DevTools hot reload)
make dev
# Local Qdrant via Docker Compose (optional)
make compose-up # start
make compose-logs # tail logs
make compose-ps # list services
make compose-down # stop
# Convenience API helpers
make health
make ingest # ingest first 1000 docs
make citations # sample citations queryAll config is env-driven. See src/main/resources/application.properties for defaults. Key vars:
GITHUB_TOKEN: GitHub personal access token for GitHub ModelsOPENAI_API_KEY: OpenAI API key (separate, independent service)OPENAI_MODEL: Model name, defaultgpt-5.2(used by all endpoints)OPENAI_TEMPERATURE: default0.7OPENAI_BASE_URL: Spring AI base URL (default:https://models.github.ai/inference)- CRITICAL: Must be
https://models.github.ai/inferencefor GitHub Models - DO NOT USE:
models.inference.ai.azure.com(this is a hallucinated URL) - DO NOT USE: Any
azure.comdomain (we don't have Azure instances)
- CRITICAL: Must be
How APIs are used:
- Spring AI (primary): Uses
OPENAI_BASE_URLwithGITHUB_TOKEN(preferred) orOPENAI_API_KEY - Direct fallbacks (on 401 auth errors):
- If
OPENAI_API_KEYexists: Direct OpenAI API athttps://api.openai.com - If only
GITHUB_TOKENexists: GitHub Models athttps://models.github.ai/inference(CORRECT endpoint)
- If
APP_LOCAL_EMBEDDING_ENABLED:trueto use local embeddings serverLOCAL_EMBEDDING_SERVER_URL: URL of your local embeddings server (default:http://127.0.0.1:8088)APP_LOCAL_EMBEDDING_DIMENSIONS:4096(actual dimensions for qwen3-embedding-8b model)- Recommended model:
text-embedding-qwen3-embedding-8b(4096 dimensions) - Note: LM Studio may show tokenizer warnings which are harmless
QDRANT_HOST: Cloud host (e.g.,xxx.us-west-1-0.aws.cloud.qdrant.io) orlocalhostfor DockerQDRANT_PORT:8086for gRPC (mapped from Docker's 6334)QDRANT_API_KEY: Your Qdrant Cloud API key (empty for local)QDRANT_SSL:truefor cloud,falsefor localQDRANT_COLLECTION: defaultjava-chat
DOCS_ROOT_URL: defaulthttps://docs.oracle.com/en/java/javase/24/DOCS_SNAPSHOT_DIR: defaultdata/snapshots(raw HTML)DOCS_PARSED_DIR: defaultdata/parsed(chunk text)DOCS_INDEX_DIR: defaultdata/index(ingest hash markers)- Containers: point
DOCS_*to a writable path (e.g.,/app/data/...) and ensure the directories exist.
We provide a unified pipeline that handles all documentation fetching and processing with intelligent deduplication:
# Complete pipeline: fetch all docs and process to Qdrant
make full-pipeline
# Or run steps separately:
make fetch-all # Fetch all documentation (checks for existing)
make process-all # Process and upload to Qdrant (deduplicates)The pipeline automatically fetches and processes:
- Java 24 API: Complete Javadocs from docs.oracle.com (10,743 files ✅)
- Java 25 API: Complete Javadocs from docs.oracle.com (10,510 files ✅)
- Spring Boot: Full reference and API documentation (10,379 files)
- Spring Framework: Core Spring docs (13,342 files)
- Spring AI: AI/ML integration docs (218 files ✅)
Current Status: Successfully indexed 22,756+ documents in Qdrant Cloud with automatic SHA-256 deduplication
# Fetch ALL documentation with deduplication checking
./scripts/fetch_all_docs.sh
# Features:
# - Checks for existing documentation before fetching
# - Downloads only missing documentation
# - Creates metadata file with statistics
# - Logs all operations for debugging# Individual fetchers if you need specific docs
./scripts/fetch_java_complete.sh # Java 24/25 Javadocs
./scripts/fetch_spring_complete.sh # Spring ecosystem only# Process all documentation with deduplication
./scripts/process_all_to_qdrant.sh
# Features:
# - SHA-256 hash-based deduplication
# - Tracks processed files in hash database
# - Prevents redundant embedding generation
# - Prevents duplicate uploads to Qdrant
# - Shows real-time progress
# - Generates processing statisticsResumable Processing: The script is designed to handle interruptions gracefully:
- If the connection is lost or the process is killed, simply re-run the script
- It will automatically skip all previously indexed documents (via hash markers in
data/index/) - Progress is preserved in Qdrant - vectors are never lost
- Each successful chunk creates a persistent marker file
How Resume Works:
- Hash Markers: Each successfully indexed chunk creates a file in
data/index/named with its SHA-256 hash - On Restart: The system checks for existing hash files before processing any chunk
- Skip Logic: If
data/index/{hash}exists, the chunk is skipped (already in Qdrant) - Atomic Operations: Markers are only created AFTER successful Qdrant insertion
Monitoring Progress:
# Check current vector count in Qdrant
source .env && curl -s -H "api-key: $QDRANT_API_KEY" \
"https://$QDRANT_HOST/collections/$QDRANT_COLLECTION" | \
grep -o '"points_count":[0-9]*' | cut -d: -f2
# Count processed chunks (hash markers)
ls data/index/ | wc -l
# Monitor real-time progress (create monitor_progress.sh)
#!/bin/bash
source .env
while true; do
count=$(curl -s -H "api-key: $QDRANT_API_KEY" \
"https://$QDRANT_HOST/collections/$QDRANT_COLLECTION" | \
grep -o '"points_count":[0-9]*' | cut -d: -f2)
echo -ne "\r[$(date +%H:%M:%S)] Vectors in Qdrant: $count"
sleep 5
donePerformance Notes:
- Local embeddings (LM Studio) process ~35-40 vectors/minute
- Full indexing of 60,000 documents takes ~24-30 hours
- The script has NO timeout - it will run until completion
- Safe to run multiple times - deduplication prevents any redundant work
# The application automatically processes docs on startup
make run # Starts app and processes any new documents
# Or trigger manual ingestion via API
curl -X POST "http://localhost:8085/api/ingest/local?path=data/docs&maxFiles=10000"- Content Hashing: Each document chunk gets a SHA-256 hash based on
url + chunkIndex + content - Hash Database: Processed files are tracked in
data/.processed_hashes.db - Vector Store Check: Before uploading, checks if hash already exists in Qdrant
- Skip Redundant Work: Prevents:
- Re-downloading existing documentation
- Re-processing already embedded documents
- Duplicate vectors in Qdrant
- Smart chunking: ~900 tokens with 150 token overlap for context preservation
- Metadata enrichment: URL, title, package name, chunk index for precise citations
- Idempotent operations: Safe to run multiple times without side effects
- Automatic retries: Handles network failures gracefully
-
POST
/api/chat/stream(SSE)- Body:
{ "sessionId": "s1", "latest": "How do I use records?" } - Streams text tokens; on completion, stores the assistant response in session memory.
- Body:
-
GET
/api/chat/citations?q=your+query- Returns top citations (URL, title, snippet) for the query.
-
GET
/api/chat/export/last?sessionId=s1- Returns the last assistant message (markdown).
-
GET
/api/chat/export/session?sessionId=s1- Returns the full session conversation as markdown.
- Chunking: ~900 tokens with 150 overlap (CL100K_BASE tokenizer via JTokkit).
- Vector search: Qdrant similarity. Next steps: enable hybrid (BM25 + vector) and MMR diversity.
- Re-ranker: planned BGE reranker (DJL) or LLM rerank for top-k. Citations pinned to top-3 by score.
Responses are grounded with citations and “background tooltips”:
- Citation metadata:
package/module,JDK version,resource/framework + version,URL,title. - Background: tooltips with bigger-picture context, hints, and reminders to aid understanding.
Data structures (server):
- Citation:
{ url, title, anchor, snippet }(seecom.williamcallahan.javachat.model.Citation). - TODO:
Enrichmentpayload with fields:packageName,jdkVersion,resource,resourceVersion,hints[],reminders[],background[]. - Guided:
GuidedLesson{ slug, title, summary, keywords[] }+ TOC fromsrc/main/resources/guided/toc.json.
UI (server-rendered static placeholder):
- Return JSON with
citationsandenrichment. The client should render:- Compact “source pills” with domain icon, title, and external-link affordance (open in new tab).
- Hover tooltips for background context (multi-paragraph allowed, markdown-safe).
- Clear, modern layout (Shadcn-inspired). Future: SPA frontend if needed.
Modes & objectives:
- Chat: fast, accurate answers with layered insights and citations.
- Guided: structured progression through core topics with the same learning affordances, plus lesson-focused chat to deepen understanding.
- OpenAI Java SDK (standardized): All streaming and non-streaming chat uses
OpenAIStreamingService- ✅ Official SDK streaming, no manual SSE parsing
- ✅ Prompt truncation for GPT‑5 context window (~400K tokens, 128K max output) handled centrally
- ✅ Clean, reliable streaming and consolidated error handling
- Removed
ResilientApiClientand all manual SSE parsing - Controllers (
ChatController,GuidedLearningController) stream via SDK only
OpenAIStreamingService: streaming + complete() helperChatService: builds prompts (RAG-aware); may stream via SDK for internal flowsEnrichmentService/RerankerService: use SDKcomplete()for JSON/ordering- Session memory management for context preservation
- Local LM Studio:
text-embedding-qwen3-embedding-8b(4096 dimensions)- Running on Apple Silicon for fast, private embeddings
- No external API calls for document processing
- Server running at http://127.0.0.1:8088 (configurable)
- Fallback: OpenAI
text-embedding-3-smallif local server unavailable - Status: ✅ Healthy and operational
- Qdrant Cloud: High-performance HNSW vector search
- Collection:
java-chatwith 22,756+ vectors - Dimensions: 4096 (matching local embedding model)
- Connected via gRPC on port 8086 (mapped from container's 6334) with SSL
- Collection:
- Smart Retrieval:
- Top-K similarity search with configurable K (default: 12)
- MMR (Maximum Marginal Relevance) for result diversity
- TF-IDF reranking for relevance optimization
- Citation System: Top 3 sources with snippets and metadata
- Re-ingesting docs: rerun
/api/ingest?maxPages=...after a docs update. - Qdrant housekeeping: snapshot/backup via Qdrant Cloud; set collection to HNSW + MMR/hybrid as needed.
- Env changes: restart app to pick up new model names or hosts.
- Logs/metrics: Spring Boot Actuator endpoints enabled for health/info/metrics.
- Observability TODO: add tracing and custom metrics (query time, tokens, hit rates).
- Error
Invalid host or portorExpected closing bracket for IPv6 address:- Ensure
QDRANT_HOSThas nohttps://prefix; it must be the hostname only. - Ensure
QDRANT_PORT=6334andQDRANT_SSL=true. - Makefile forces IPv4 (
-Djava.net.preferIPv4Stack=true) to avoid macOS IPv6 resolver quirks.
- Ensure
- Dimension mismatch errors:
- Ensure
APP_LOCAL_EMBEDDING_DIMENSIONS=4096matches your embedding model - Delete and recreate Qdrant collection if dimensions change
- Ensure
- LM Studio tokenizer warnings:
- "[WARNING] At least one last token in strings embedded is not SEP" is harmless
- GitHub Models API: ~15 requests/minute free tier. Set both
GITHUB_TOKENandOPENAI_API_KEYfor automatic fallback. - Built-in retry: 5 attempts with exponential backoff (2s → 30s max). Configurable via
AI_RETRY_*env vars. - Fallback behavior: On 429 errors, automatically switches to OpenAI API if
OPENAI_API_KEYis available.
- Hybrid retrieval (BM25 + vector), MMR, and re-ranker integration.
- Enrichment payload + endpoint for tooltips/hints/reminders with package/JDK metadata.
- Content hashing + upsert-by-hash for dedup and change detection.
- Minimal SPA with modern source pills, tooltips, and copy actions.
- Persist user chats + embeddings (future, configurable).
- Slash-commands (/search, /explain, /example) with semantic routing.
- Per-session rate limiting.
- DigitalOcean Spaces S3 offload for snapshots & parsed text.
- Docker Compose app service + optional local embedding model.
The Java Chat application is fully optimized for mobile devices with comprehensive responsive design and mobile-specific safety measures.
- Full-width chat containers on mobile with comfortable margins
- 16px minimum font size on all inputs to prevent iOS Safari zoom
- Enhanced touch targets (44px minimum) for all interactive elements
- Touch-optimized scrolling with momentum scrolling support
- Safe area insets for devices with notches (iPhone X+)
- Zoom prevention on double-tap for chat areas
- Horizontal scroll prevention with proper text wrapping
- Improved focus visibility for keyboard navigation
- Reduced motion support for accessibility preferences
- Mobile: ≤768px - Full mobile optimization
- Tablet: 769px-1024px - Intermediate responsive layout
- Desktop: >1024px - Full desktop experience
- Viewport Configuration: Prevents unwanted zooming and ensures proper scaling
- Text Size Adjustment: Prevents browser text inflation on mobile
- Touch Action Optimization: Improves touch responsiveness and prevents conflicts
- Performance Optimizations: CSS containment and will-change for smooth animations
- Accessibility: Respects
prefers-reduced-motionfor users with motion sensitivity
- ✅ iOS Safari (iPhone/iPad)
- ✅ Chrome Mobile (Android)
- ✅ Samsung Internet
- ✅ Firefox Mobile
- ✅ Edge Mobile
- Font sizes < 16px on inputs - Causes iOS Safari to zoom
- Touch targets < 44px - Poor accessibility and usability
- Fixed positioning without safe-area-insets - Content hidden by notches
- Horizontal overflow - Breaks mobile UX
- user-scalable=yes without maximum-scale - Allows accidental zoom
- Missing touch-action: manipulation - Slower tap response (300ms delay)
- Viewport units (vh/vw) without fallbacks - Inconsistent on mobile browsers
- Hover-only interactions - Inaccessible on touch devices
- Small click areas - Difficult to tap accurately
- Ignoring prefers-reduced-motion - Accessibility violation
- Spring Boot 3.5.5 (WebFlux, Actuator)
- Spring AI 1.0.1 (OpenAI client, VectorStore Qdrant)
- Qdrant (HNSW vector DB);
docker-compose.ymlincludes a local dev service - JSoup (HTML parsing), JTokkit (tokenization), Fastutil (utils)
- Mobile-First CSS: Responsive design with mobile-specific optimizations
Docker Compose (Qdrant only, optional fallback when you outgrow the free Qdrant Cloud plan or for offline dev):
docker compose up -d
# Then set QDRANT_HOST=localhost QDRANT_PORT=8086