Skip to content

Conversation

@WilliamAGH
Copy link
Owner

This pull request introduces significant improvements to both the frontend and backend of the guided learning feature, focusing on better session isolation per lesson, enhanced contextual guidance for LLM responses, and codebase modularization. The most important changes are grouped below:

Frontend: Guided Chat Session Isolation

  • Each lesson now has its own chat session ID in LearnView.svelte, ensuring conversations do not bleed across lessons and improving backend context separation. (frontend/src/lib/components/LearnView.svelte) [1] [2] [3]

Backend: Guidance and Prompt Improvements

  • The guided learning system prompt is now dynamically constructed to include the current lesson's context (title, summary, keywords) and explicit topic-handling rules, keeping LLM responses focused and pedagogically sound. (GuidedLearningService.java) [1] [2] [3] [4]
  • Guidance construction is modularized, combining lesson context with global guided learning instructions from SystemPromptConfig. (GuidedLearningService.java) [1] [2] [3]

Backend: Search Quality Abstraction

  • Introduced the SearchQualityLevel enum to centralize and standardize how the quality of search results is described to the LLM, replacing scattered boolean checks with a self-describing, extensible approach. (SearchQualityLevel.java, ChatService.java) [1] [2] [3]

Backend: PDF Citation Enhancement Refactor

  • PDF citation enhancement logic (for adding page anchors to citations) is now delegated to a new PdfCitationEnhancer support class, simplifying GuidedLearningService and improving maintainability. (GuidedLearningService.java) [1] [2] [3]

These changes collectively make guided learning chat more robust, context-aware, and maintainable.

…g behavior

ChatService contained 30+ lines of sequential boolean checks to categorize search
result quality and generate LLM context messages. This procedural logic violated
Tell-Don't-Ask by interrogating document properties instead of letting a domain
concept describe itself.

The new enum encapsulates both categorization logic (determine) and message
formatting (formatMessage), eliminating the boolean chain and making quality
levels explicit domain vocabulary. The static describeQuality convenience method
maintains the existing API contract while delegating to the new polymorphic design.

Also removes deprecated streamResponse(String, double) method from OpenAIStreamingService
as all callers now use the StructuredPrompt-based overload.

- Add SearchQualityLevel enum with NONE, KEYWORD_SEARCH, HIGH_QUALITY, MIXED_QUALITY
- Each level owns its message template and formatting logic
- Static determine() replaces procedural if-else chain in ChatService
- describeQuality() provides drop-in replacement for existing callers
- Remove deprecated OpenAIStreamingService.streamResponse(String, double)
…-aware guidance

GuidedLearningService had grown to include 80+ lines of PDF pagination logic
(page counting, chunk enumeration, anchor calculation) unrelated to lesson
orchestration, plus it lacked awareness of the current lesson context when
generating LLM guidance.

This commit addresses both issues:

1. Extraction: The new PdfCitationEnhancer component owns the entire citation
   enhancement workflow (loading PDF for page count, counting chunks, parsing
   metadata, estimating pages). This removes LocalStoreService from
   GuidedLearningService and improves testability.

2. Lesson Focus: New buildLessonGuidance() and buildLessonContextDescription()
   methods construct LLM prompts that include the lesson title, summary, and
   keywords. The guidance template now includes topic handling rules to redirect
   greetings and off-topic questions back to the current lesson.

- Extract PdfCitationEnhancer to support/ with proper Javadocs
- Add THINK_JAVA_GUIDANCE_TEMPLATE with %s placeholder for lesson context
- Inject SystemPromptConfig for guided learning mode instructions
- Update streamLessonChat and buildLessonPrompt to use lesson-aware guidance
…ion bleeding

A single session ID was shared across all lessons in the frontend, causing
conversation history from one lesson to bleed into another. For example, asking
about loops after studying variables would reference unrelated variable
discussion from the shared history.

The frontend now maintains a Map<lessonSlug, sessionId> so each lesson gets an
isolated backend conversation. The GuidedStreamRequest accessor methods now
return Optional<String> instead of defaulting to empty strings, forcing callers
to handle missing values explicitly.

Frontend:
- Replace single sessionId with sessionIdsByLesson Map
- Add getSessionIdForLesson() to create/retrieve per-lesson sessions
- Use lesson-scoped session ID in streamGuidedChat calls

Backend:
- Change GuidedStreamRequest.userQuery() to return Optional<String>
- Change GuidedStreamRequest.lessonSlug() to return Optional<String>
- Filter out blank values in Optional accessors
…tional API

Adds test coverage for the SearchQualityLevel enum extracted in commit 6095599,
verifying all quality level determinations and message formatting. Also updates
GuidedLearningController to use orElseThrow() on the new Optional<String> return
types from GuidedStreamRequest, failing fast with clear error messages when
required fields are missing.

Tests cover:
- NONE returned for null/empty document lists
- KEYWORD_SEARCH detected from URL metadata patterns
- HIGH_QUALITY when all documents have substantial content (>100 chars)
- MIXED_QUALITY when some documents have short content
- formatMessage() produces correct strings for each level
- describeQuality() convenience method integration

Controller:
- Use orElseThrow() on userQuery() with "User query is required" message
- Use orElseThrow() on lessonSlug() with "Lesson slug is required" message
@WilliamAGH WilliamAGH self-assigned this Jan 25, 2026
Copilot AI review requested due to automatic review settings January 25, 2026 20:57
@coderabbitai
Copy link

coderabbitai bot commented Jan 25, 2026

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

📝 Walkthrough

Summary by CodeRabbit

  • New Features
    • Per-lesson guided chat sessions with stable per-message streaming and streamed, per-message citations (improved citation anchors).
  • UI / Refactor
    • Redesigned desktop and mobile chat panels, message bubbles, and citation panels with improved responsiveness and scroll behavior.
  • Documentation
    • Added comprehensive docs: architecture, API, configuration, getting-started, ingestion, and developer guide.
  • Chores
    • Toolchain bumped to Java 25; CI workflow and code-formatting tooling added.
  • Tests
    • New frontend and backend tests covering streaming, citations, and guided lesson flows.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Per-lesson guided chat and citation streaming implemented: frontend now uses lesson-scoped sessions and per-message IDs with streaming/citation attachment; backend adds guided prompt outcomes, PDF page-anchor enrichment, retrieval-quality classification, SSE abort/validation handling, tests, docs, and toolchain/tooling updates.

Changes

Cohort / File(s) Summary
Frontend: Learn UI & Chat Panels
frontend/src/lib/components/LearnView.svelte, frontend/src/lib/components/GuidedLessonChatPanel.svelte, frontend/src/lib/components/MobileChatDrawer.svelte, frontend/src/lib/components/ChatView.svelte, frontend/src/lib/components/MessageBubble.svelte
Move to lesson-scoped sessions, per-message messageId, replace inline desktop chat with GuidedLessonChatPanel, wire mobile drawer and message/citation rendering, and adjust streaming guards.
Frontend: Streaming, SSE & IDs
frontend/src/lib/services/sse.ts, frontend/src/lib/services/chat.ts, frontend/src/lib/utils/chatMessageId.ts, frontend/src/lib/composables/createStreamingState.svelte.ts, frontend/src/lib/services/stream-types.ts
Add AbortSignal-aware SSE, propagate signals, remove global streamingContent, add stable createChatMessageId, add clearChatSession, and tighten streaming state interface.
Frontend: Message list & renderer
frontend/src/lib/components/StreamingMessagesList.svelte, frontend/src/lib/components/StreamingMessagesList.svelte (tests), frontend/src/lib/components/MessageBubble.test.ts, frontend/src/lib/components/ChatView.test.ts
Key messages by messageId, add per-message isStreaming flag, update renderer API to include isStreaming, and update tests for DOM stability and cursor lifecycle.
Frontend: Guided citations & components
frontend/src/lib/components/LessonCitations.svelte, frontend/src/lib/components/CitationPanel.svelte, frontend/src/lib/services/guided.ts, frontend/src/lib/components/GuidedLessonChatPanel.svelte
New citation flow: fetchGuidedLessonCitations, onCitations streaming callback, LessonCitations/CitationPanel components, and attach citations to assistant messages.
Frontend: Tests & test infra
frontend/src/lib/components/LearnView.test.ts, frontend/src/lib/components/ChatView.test.ts, frontend/src/test/setup.ts, frontend/vitest.config.ts
New/updated tests covering guided streaming, citations, cancellation, DOM stability; add test polyfills and resolver conditions.
Frontend: Utilities & API surface
frontend/src/lib/services/chat.ts, frontend/src/lib/services/guided.ts, frontend/src/lib/utils/chatMessageId.ts
Add ChatMessage.messageId, StreamChatOptions.signal, clearChatSession client API, guided client exposes citation fetch and onCitations callback.
Backend: Guided learning & controller
src/main/java/.../GuidedLearningService.java, src/main/java/.../GuidedLearningController.java, src/main/java/.../GuidedStreamRequest.java
Replace LocalStoreService with SystemPromptConfig + PdfCitationEnhancer; return GuidedChatPromptOutcome; structured prompt streaming with SSE citation events; request validation and availability checks.
Backend: PDF citation enhancer & adapters
src/main/java/.../support/PdfCitationEnhancer.java, src/main/java/.../support/DocumentContentAdapter.java
New PdfCitationEnhancer to estimate PDF page anchors from chunk metadata and DocumentContentAdapter to adapt Documents → RetrievedContent.
Backend: Retrieval quality & domain
src/main/java/.../domain/SearchQualityLevel.java, src/main/java/.../domain/RetrievedContent.java
New SearchQualityLevel enum to classify retrieval quality and produce prompt messages; RetrievedContent domain interface added.
Backend: Chat & streaming service tweaks
src/main/java/.../service/ChatService.java, src/main/java/.../service/OpenAIStreamingService.java
Delegate search-quality messaging to SearchQualityLevel.describeQuality, remove deprecated streaming overload, and tighten availability/logging.
Backend: Markdown immutability & prompt helper
src/main/java/.../domain/markdown/MarkdownStructuredOutcome.java, src/main/java/.../ProcessedMarkdown.java, src/main/java/.../config/SystemPromptConfig.java
Make markdown lists defensive/immutable on construct and add SystemPromptConfig.buildFullPrompt helper.
Build, CI & tooling
build.gradle.kts, Dockerfile, .tool-versions, .github/workflows/build.yml, Makefile, spotbugs-exclude.xml
Bump Java toolchain to 25, add Spotless, update Docker base images and CI workflow, add Makefile format target and pre-commit hooks, adjust SpotBugs excludes.
Docs & guides
README.md, docs/*
Large documentation reorg: new getting-started, api, architecture, configuration, ingestion, development docs and condensed root README.
Scripts & Qdrant changes
scripts/*, docker-compose-qdrant.yml
Move docs-source config to properties, switch Qdrant checks to REST with optional API key/SSL, minor image path fix.
Widespread formatting & tests
many Java and test files (see diff)
Extensive stylistic reflows, record/constructor compacting, added defensive copies, many test updates; mostly formatting across the codebase.

Sequence Diagram(s)

sequenceDiagram
    participant User as User (Browser)
    participant FE as Frontend (LearnView / GuidedPanel)
    participant FEcli as FE Stream Client
    participant BE as Backend (GuidedLearningController)
    participant GLS as GuidedLearningService
    participant LLM as OpenAI Streaming
    participant Store as Document Store

    User->>FE: select lesson, type & send query
    FE->>FEcli: create lessonSessionId & user messageId
    FEcli->>BE: POST /api/guided/stream (sessionId, slug, messageId)
    BE->>GLS: buildStructuredGuidedPromptWithContext(history, slug, query)
    GLS->>Store: load grounding documents
    GLS->>GLS: enhance citations (PdfCitationEnhancer)
    GLS-->>BE: GuidedChatPromptOutcome (prompt + docs)
    BE->>LLM: start streaming structured prompt
    LLM-->>BE: token chunk events
    BE-->>FEcli: SSE data events (chunks)
    FEcli->>FE: append chunks into assistant message (messageId)
    GLS-->>BE: citation payload
    BE-->>FEcli: SSE citation event
    FEcli->>FE: attach citations to the same assistant message
    LLM-->>BE: [DONE]
    FE->>User: render completed assistant message with citations
Loading
sequenceDiagram
    participant Ingest as Ingestion pipeline
    participant Store as LocalStoreService
    participant Enhance as PdfCitationEnhancer
    participant GLS as GuidedLearningService

    Ingest->>Store: save PDF chunks with metadata (chunkIndex)
    Store-->>Enhance: list chunk files → totalChunks
    Enhance->>Enhance: compute page estimate from chunkIndex/totalChunks
    Enhance-->>GLS: citations with '#page' anchors
    GLS-->>BE: include page-anchored citations in GuidedChatPromptOutcome
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

Poem

✨ A lesson opens with a lesson-scoped key,
Streams of tokens paint the answer you see.
Citations find pages, anchors hold true,
Frontend and backend dance in Java 25 hue.
Small steps, big learning — one chunk at a time.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.62% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Add per-lesson chat isolation, result classification, dynamic prompts' directly and clearly summarizes the three main changes introduced in the changeset.
Description check ✅ Passed The description provides substantive detail about frontend session isolation, backend dynamic prompts, search quality abstraction, and PDF citation refactoring—all core changes present in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai bot changed the title merge Add per-lesson chat isolation, result classification, dynamic prompts Jan 25, 2026
@coderabbitai coderabbitai bot added enhancement New feature or request refactor Code refactoring labels Jan 25, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the guided learning experience by isolating chat sessions per lesson, enriching LLM guidance with lesson context and standardized search quality signals, and modularizing PDF citation handling.

Changes:

  • Introduces SearchQualityLevel and associated tests to centralize how search quality is categorized and described to the LLM, and wires it into ChatService.
  • Refactors guided learning backend to build lesson-aware guidance prompts via SystemPromptConfig, delegate PDF citation page anchoring to PdfCitationEnhancer, and tighten validation of guided streaming requests.
  • Updates the guided learning frontend to use per-lesson session IDs when streaming chat, ensuring conversation isolation across lessons.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java Adds an enum that classifies search result sets and produces consistent, self-describing quality messages consumed by the chat pipeline.
src/test/java/com/williamcallahan/javachat/domain/SearchQualityLevelTest.java Verifies SearchQualityLevel.determine, formatMessage, and describeQuality behavior across empty, keyword, high-quality, and mixed-quality scenarios.
src/main/java/com/williamcallahan/javachat/web/GuidedStreamRequest.java Changes userQuery and lessonSlug accessors to Optional, encouraging explicit handling of missing/blank values for guided streaming requests.
src/main/java/com/williamcallahan/javachat/web/GuidedLearningController.java Uses the new Optional accessors to enforce required user query and lesson slug for guided chat, and continues to build structured prompts for streaming.
src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java New component that computes and caches Think Java PDF page counts, estimates page numbers from chunk indices, and appends #page=N anchors and anchors fields to PDF citations.
src/main/java/com/williamcallahan/javachat/service/OpenAIStreamingService.java Removes the deprecated string-based streaming API, leaving a single structured-prompt-based streamResponse entry point used by controllers and services.
src/main/java/com/williamcallahan/javachat/service/GuidedLearningService.java Wires in SystemPromptConfig and PdfCitationEnhancer, uses lesson metadata to build targeted guidance strings, and delegates PDF citation enhancement out of the service.
src/main/java/com/williamcallahan/javachat/service/ChatService.java Delegates search quality description to SearchQualityLevel.describeQuality, simplifying logic and keeping messaging consistent with system prompt behavior.
frontend/src/lib/components/LearnView.svelte Replaces a single guided session ID with a per-lesson session ID map and passes the appropriate ID into streamGuidedChat, preventing cross-lesson conversation bleed.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
frontend/src/lib/components/LearnView.svelte (1)

213-227: Use the captured slug when starting the stream.

Because doScrollToBottom() is awaited, selectedLesson can change before the request fires, pairing an old session ID with a new slug. Using streamLessonSlug keeps session isolation intact. Tiny tip: captured values are your safest anchors in async flows.

🔧 Suggested fix
-      await streamGuidedChat(lessonSessionId, selectedLesson.slug, userQuery, {
+      await streamGuidedChat(lessonSessionId, streamLessonSlug, userQuery, {
🤖 Fix all issues with AI agents
In `@src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java`:
- Around line 1-5: SearchQualityLevel currently depends on
org.springframework.ai.document.Document, duplicates the "100" magic literal and
repeats the same high-quality filtering logic; remove the framework import and
refactor by introducing a domain-facing interface (e.g., RetrievedContent with
String getText() and boolean isHighQuality()) and use that in SearchQualityLevel
instead of Document, extract the numeric literal into a named constant (e.g.,
HIGH_QUALITY_LENGTH_THRESHOLD = 100) and replace both occurrences with it, and
move the duplicated filter/count logic into a single helper method (e.g.,
countHighQuality(List<RetrievedContent>) or RetrievedContent.isHighQuality())
inside the enum or a small domain util so both code paths call the same method;
map Document to RetrievedContent in an adapter layer outside the domain.

In `@src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java`:
- Around line 88-95: Reword the Javadoc opener for the PdfCitationEnhancer
method that returns the Think Java PDF page count (the method currently
documented with "Gets the total page count for the Think Java PDF.") to use a
direct verb or noun phrase instead of "Gets the…"; update the first sentence to
something like "Return the total page count for the Think Java PDF." or "Total
page count of the Think Java PDF." while keeping the rest of the Javadoc and
tags unchanged.
- Around line 45-55: Update the PdfCitationEnhancer Javadoc for the public
method enhanceWithPageAnchors to explicitly document that it may throw an
UncheckedIOException when PDF loading or chunk listing fails; add an `@throws`
UncheckedIOException line describing that the exception is raised if underlying
I/O operations (e.g., opening the PDF, reading chunks, or listing chunk
metadata) fail during page-anchor estimation so callers are aware to handle
runtime I/O errors.
- Around line 56-59: In PdfCitationEnhancer.enhanceWithPageAnchors, don't
silently return the original citations when docs.size() != citations.size();
instead surface the mismatch by failing fast — either throw an
IllegalArgumentException with a clear message including both sizes, or at
minimum log a warning with those sizes and then throw; update the method to
validate the contract up front so callers immediately see the mismatch rather
than getting back unmodified data.

In
`@src/main/java/com/williamcallahan/javachat/web/GuidedLearningController.java`:
- Around line 187-190: GuidedLearningController currently throws
IllegalArgumentException for missing fields but lacks an `@ExceptionHandler`,
causing 500 responses; add a method annotated with
`@ExceptionHandler`(IllegalArgumentException.class) in GuidedLearningController
that delegates to the shared super.handleValidationException(...) exactly like
IngestionController does so IllegalArgumentException is mapped to a 400
response; reference the controller class name GuidedLearningController and the
existing superclass method handleValidationException to locate where to add the
handler.
🧹 Nitpick comments (6)
src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java (1)

68-83: Extract PDF-related literals into constants.

Strings like .pdf, #page=, and the anchor key are repeated and should be named constants for clarity and consistency. As per coding guidelines, avoid magic literals.

♻️ Example refactor
+    private static final String PDF_EXTENSION = ".pdf";
+    private static final String PAGE_ANCHOR_PREFIX = "#page=";
+    private static final String PAGE_ANCHOR_VALUE_PREFIX = "page=";
@@
-            if (url == null || !url.toLowerCase(Locale.ROOT).endsWith(".pdf")) {
+            if (url == null || !url.toLowerCase(Locale.ROOT).endsWith(PDF_EXTENSION)) {
                 continue;
             }
@@
-                String withAnchor = url.contains("#page=") ? url : url + "#page=" + page;
+                String withAnchor = url.contains(PAGE_ANCHOR_PREFIX) ? url : url + PAGE_ANCHOR_PREFIX + page;
                 citation.setUrl(withAnchor);
-                citation.setAnchor("page=" + page);
+                citation.setAnchor(PAGE_ANCHOR_VALUE_PREFIX + page);
src/main/java/com/williamcallahan/javachat/service/GuidedLearningService.java (2)

71-77: Constructor parameter list is getting long.

Now at 7 parameters; consider a parameter object or grouping related dependencies to keep injection tidy. Tiny tip: fewer constructor params makes wiring and tests happier. As per coding guidelines, avoid >4 positional parameters.


323-339: Extract lesson-label strings into named constants.

Inline labels like “Lesson Summary:” and the default “No specific lesson selected…” are magic literals; constants improve consistency and reuse. Tiny tip: names turn strings into domain vocabulary. As per coding guidelines, avoid magic literals.

♻️ Example extraction
+    private static final String NO_LESSON_CONTEXT_MESSAGE =
+            "No specific lesson selected. Provide general Java learning assistance.";
+    private static final String LESSON_SUMMARY_LABEL = "Lesson Summary: ";
+    private static final String KEY_CONCEPTS_LABEL = "Key concepts to cover: ";
@@
-            return "No specific lesson selected. Provide general Java learning assistance.";
+            return NO_LESSON_CONTEXT_MESSAGE;
@@
-            contextBuilder.append("\n\nLesson Summary: ").append(lesson.getSummary());
+            contextBuilder.append("\n\n").append(LESSON_SUMMARY_LABEL).append(lesson.getSummary());
@@
-            contextBuilder.append("\n\nKey concepts to cover: ")
+            contextBuilder.append("\n\n").append(KEY_CONCEPTS_LABEL)
                     .append(String.join(", ", lesson.getKeywords()));
src/main/java/com/williamcallahan/javachat/web/GuidedStreamRequest.java (1)

28-45: Tighten accessor docs and consider value types.

Style-wise, drop “Returns the…” in the opener; also consider introducing a validated LessonSlug/UserQuery value type so these accessors return Optional<LessonSlug> etc. Tiny tip: small domain types are great bug repellant. As per coding guidelines, avoid filler Javadoc phrases and raw primitives at API boundaries.

✍️ Suggested Javadoc wording
-     * Returns the user query when present and non-blank.
+     * Provides the user query when present and non-blank.
@@
-     * Returns the lesson slug when present and non-blank.
+     * Provides the lesson slug when present and non-blank.
src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java (2)

77-83: Extract magic literal 100 to a named constant.

The threshold 100 appears here and again in describeQuality(). A named constant like SUBSTANTIAL_CONTENT_LENGTH_THRESHOLD makes the intent crystal clear and ensures consistency if you ever need to tune it. Plus, future-you will thank present-you when debugging! 📚

✨ Suggested improvement
 public enum SearchQualityLevel {
+    /**
+     * Minimum content length (in characters) to consider a document substantial.
+     */
+    private static final int SUBSTANTIAL_CONTENT_THRESHOLD = 100;
+
     // ... enum values ...

     // In determine():
-                    return content != null && content.length() > 100;
+                    return content != null && content.length() > SUBSTANTIAL_CONTENT_THRESHOLD;

     // In describeQuality():
-                        .filter(doc -> doc.getText() != null && doc.getText().length() > 100)
+                        .filter(doc -> doc.getText() != null && doc.getText().length() > SUBSTANTIAL_CONTENT_THRESHOLD)

98-108: Duplicate high-quality filtering logic could be consolidated.

The same doc.getText() != null && doc.getText().length() > 100 predicate appears in both determine() and describeQuality(). Consider extracting a small helper predicate—it keeps things DRY and makes future threshold adjustments safer. Think of it as giving that logic a cozy home of its own! 🏠

✨ Suggested consolidation
+    private static boolean isSubstantialContent(Document doc) {
+        String content = doc.getText();
+        return content != null && content.length() > SUBSTANTIAL_CONTENT_THRESHOLD;
+    }
+
     public static SearchQualityLevel determine(List<Document> docs) {
         // ...
         long highQualityCount = docs.stream()
-                .filter(doc -> {
-                    String content = doc.getText();
-                    return content != null && content.length() > 100;
-                })
+                .filter(SearchQualityLevel::isSubstantialContent)
                 .count();
         // ...
     }

     public static String describeQuality(List<Document> docs) {
         SearchQualityLevel level = determine(docs);
         int totalCount = docs != null ? docs.size() : 0;
         long highQualityCount = docs != null
-                ? docs.stream()
-                        .filter(doc -> doc.getText() != null && doc.getText().length() > 100)
-                        .count()
+                ? docs.stream().filter(SearchQualityLevel::isSubstantialContent).count()
                 : 0;
         return level.formatMessage(totalCount, (int) highQualityCount);
     }

The original README (474 lines) contained everything from quick-start to API
reference, configuration, architecture, mobile notes, and troubleshooting.
This made it hard to navigate, maintain, and onboard contributors. The
refactoring extracts each concern into its own file under docs/ while leaving
a concise README as the project landing page.

New documentation structure:
- docs/README.md: table of contents and entry point
- docs/getting-started.md: prerequisites, quick-start, common commands
- docs/configuration.md: env vars for LLM, embeddings, Qdrant, RAG tuning
- docs/api.md: HTTP endpoints (SSE streaming, guided learning, ingestion)
- docs/ingestion.md: fetch/process/dedupe pipeline for RAG indexing
- docs/architecture.md: high-level components and request flow

Root README now: project summary, feature highlights, quick-start snippet,
and a pointer to docs/README.md for comprehensive documentation.
@railway-app railway-app bot temporarily deployed to insightful-intuition / production January 25, 2026 21:20 Inactive
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94f27592ed

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI agents
In `@docs/configuration.md`:
- Around line 18-19: Update the sentence "If both keys are present, the service
prefers OpenAI for streaming and can fall back to GitHub Models based on
rate-limit/backoff state." to remove any implication of silent provider
switching: document that falling back requires an explicit opt-in configuration
(e.g., a named config flag like ENABLE_PROVIDER_FALLBACK or
prefer_openai_with_fallback) and otherwise the service surfaces a visible error
when the preferred provider fails; also add a short note describing the
user-visible error path and how to enable fallback so readers know how to
control model attribution and debugging.

In `@README.md`:
- Around line 7-12: Update the README's "Documentation ingestion pipeline (fetch
→ chunk → embed → dedupe → index)" bullet to mention that chunking uses
JTokkit's CL100K_BASE tokenizer (GPT‑3.5/4 style) and briefly note how that
affects token counts and overlap behavior; specifically, add a short sentence
after the pipeline bullet explaining "chunks are created using the CL100K_BASE
tokenizer from JTokkit (GPT‑3.5/4 tokenization) so token counts and overlap
should be estimated with that tokenizer" and optionally a one-line tip about
estimating tokens for typical text to predict chunk sizes and overlaps.

In `@src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java`:
- Around line 64-86: The code uses a global thinkJavaPages (from
getThinkJavaPdfPages()) for every PDF which can mis-anchor non-Think‑Java PDFs;
restrict anchor logic to only the Think Java PDF by checking the URL before
applying page anchors (e.g., compare url to the known Think Java PDF URL or
detect a Think‑Java identifier) and only then call estimatePage and set
citation.setUrl / citation.setAnchor; alternatively compute per-URL page counts
instead of using thinkJavaPages if you need anchors for other PDFs. Ensure this
check is added around the block that uses thinkJavaPages, leaving
parseChunkIndex, countChunksForUrl and estimatePage untouched.

@railway-app railway-app bot temporarily deployed to insightful-intuition / production January 25, 2026 21:27 Inactive
…degradation

The application was catching IOException during TOC loading and returning an
empty list. This caused the guided learning feature to silently fail (showing
empty lessons) if the resource was missing or corrupt, masking the root cause.
Throwing a runtime exception ensures the application fails fast during startup
or first access, alerting developers to the configuration issue immediately.

- Throw IllegalStateException instead of logging and returning empty list
- Wraps the original IOException for debugging context
- Addresses Clean Code "Silent Degradation" (BLK2) violation
Recent refactorings introduced minor formatting inconsistencies (line wrapping,
whitespace). These changes align the codebase with the project's Spotless/Palantir
formatting rules.

- Fix line wrapping in SearchQualityLevel and EmbeddingCacheService
- Fix conditional formatting in test files
- Normalize whitespace in extracted constants
@railway-app railway-app bot temporarily deployed to insightful-intuition / production January 26, 2026 00:32 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@docs/development.md`:
- Around line 221-222: Update the typo in the References section by replacing
the incorrect URL string "https://mise.jdnow.dev/" (attached to the link text
"[mise version manager]") with the correct official site
"https://mise.jdx.dev/"; locate the link entry for "mise version manager" in
docs/development.md and change only the hostname to jdx.dev so the markdown link
becomes the valid canonical URL.

In `@docs/getting-started.md`:
- Line 24: Update the incorrect mise URL in docs/getting-started.md: replace the
referenced URL "https://mise.jdnow.dev/" with the correct official URL
"https://mise.jdx.dev/". Ensure any occurrences of the old URL (search for
"https://mise.jdnow.dev/") are updated to the new one and save the file.

In `@Makefile`:
- Around line 118-126: The dev-backend target only adds APP_ARGS for
GITHUB_TOKEN but not for OPENAI_API_KEY; add an elif branch after the existing
if [ -n "$$GITHUB_TOKEN" ] that checks [ -n "$$OPENAI_API_KEY" ] and appends the
corresponding spring.ai.openai.* flags to APP_ARGS (at minimum
--spring.ai.openai.api-key from $$OPENAI_API_KEY and the base-url and model
flags using the OPENAI_* env vars or sensible defaults). Update the APP_ARGS
construction in the dev-backend target (the APP_ARGS variable) to mirror the
run/dev logic so OpenAI-only users are supported.

In `@src/main/java/com/williamcallahan/javachat/service/GuidedTOCProvider.java`:
- Around line 50-54: The finally block sets tocLoaded = true even on failure,
causing subsequent calls to silently return cached/empty results; update
GuidedTOCProvider so tocLoaded is only set when the TOC load succeeds (e.g.,
move the tocLoaded = true assignment into the successful path inside the try
after the cache is populated or after the code that processes the InputStream
completes) and leave it false when an IOException is thrown so callers will
retry loading instead of getting a silent empty result.
♻️ Duplicate comments (3)
src/test/java/com/williamcallahan/javachat/ExtractorQualityTest.java (2)

113-117: Path-separator portability fix confirmed! ✅

The dual check for both forward slashes (/api/) and backslashes (\\api\\) ensures this utility works correctly on both Unix-like systems and Windows. Fun fact: Java's Path.toString() uses the platform's native separator, so this cross-platform check is the right call.


146-151: Division-by-zero guard in place! 🛡️

The oldText.length() > 0 check prevents Infinity or NaN from sneaking into your logs when encountering empty documents. Clean defensive coding.

Makefile (1)

57-65: The run target still needs OPENAI_API_KEY handling in APP_ARGS.

Hey there! 👋 Quick heads up—while the validation at lines 48-51 accepts either GITHUB_TOKEN or OPENAI_API_KEY, the APP_ARGS block here only populates credentials when GITHUB_TOKEN is set.

If a user configures only OPENAI_API_KEY, they'll pass the gate but the app won't receive any API key—leading to a confusing runtime failure. The dev target (lines 92-95) already has the fix pattern in place!

🐛 Proposed fix
   APP_ARGS=(--server.port=$$SERVER_PORT); \
   if [ -n "$$GITHUB_TOKEN" ]; then \
     APP_ARGS+=( \
       --spring.ai.openai.api-key="$$GITHUB_TOKEN" \
       --spring.ai.openai.base-url="$${GITHUB_MODELS_BASE_URL:-https://models.github.ai/inference}" \
       --spring.ai.openai.chat.options.model="$${GITHUB_MODELS_CHAT_MODEL:-gpt-5}" \
       --spring.ai.openai.embedding.options.model="$${GITHUB_MODELS_EMBED_MODEL:-text-embedding-3-small}" \
     ); \
+  elif [ -n "$$OPENAI_API_KEY" ]; then \
+    APP_ARGS+=( \
+      --spring.ai.openai.api-key="$$OPENAI_API_KEY" \
+    ); \
   fi; \
🧹 Nitpick comments (4)
src/test/java/com/williamcallahan/javachat/ExtractorQualityTest.java (1)

119-128: Consider extracting noise patterns to a constant (optional nitpick).

For a manual utility runner, inline patterns are totally fine. But if you ever expand this list or want to reuse it elsewhere, pulling it into a private static final String[] NOISE_PATTERNS at the class level could keep things neat. Low priority—just a thought for future-you! 🧹

src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java (1)

110-122: Minor optimization opportunity: countHighQuality is called twice.

When describeQuality calls determine(), the high-quality count is computed internally, then computed again on line 119. For typical search result sizes this is negligible, but if you're feeling tidy, you could refactor to avoid the duplicate work.

♻️ Optional refactor to compute count once

One approach is to have determine return a small result object (or use a private helper struct):

+    private record QualityAssessment(SearchQualityLevel level, int totalCount, int highQualityCount) {}
+
+    private static QualityAssessment assess(List<? extends RetrievedContent> contents) {
+        if (contents == null || contents.isEmpty()) {
+            return new QualityAssessment(NONE, 0, 0);
+        }
+        int total = contents.size();
+        int highQuality = (int) countHighQuality(contents);
+        
+        boolean likelyKeywordSearch = contents.stream().anyMatch(content -> content.getSourceUrl()
+                .filter(url -> url.contains(KEYWORD_MARKER_LOCAL_SEARCH) || url.contains(KEYWORD_MARKER_KEYWORD))
+                .isPresent());
+        
+        if (likelyKeywordSearch) {
+            return new QualityAssessment(KEYWORD_SEARCH, total, highQuality);
+        }
+        if (highQuality == total) {
+            return new QualityAssessment(HIGH_QUALITY, total, highQuality);
+        }
+        return new QualityAssessment(MIXED_QUALITY, total, highQuality);
+    }
+
     public static String describeQuality(List<? extends RetrievedContent> contents) {
-        SearchQualityLevel level = determine(contents);
-        int totalCount = contents != null ? contents.size() : 0;
-        long highQualityCount = countHighQuality(contents);
-
-        return level.formatMessage(totalCount, (int) highQualityCount);
+        QualityAssessment assessment = assess(contents);
+        return assessment.level().formatMessage(assessment.totalCount(), assessment.highQualityCount());
     }

This keeps determine() unchanged for external callers, while describeQuality uses the richer internal helper.

Makefile (1)

24-24: Optional: Consider adding an all target.

The static analysis tool notes a missing all target—this is a common Makefile convention where all serves as the default build target. Since you have help as the first target, users already get a sensible default.

If you'd like to follow the convention, you could add:

all: build  ## Default target: build the project

This is purely a "nice to have" for Makefile purists! 📚

scripts/monitor_progress.sh (1)

22-22: Consider adding a timeout and using jq for more robust JSON parsing.

The current grep/cut approach works, but it's a bit fragile if the JSON structure changes. Also, without a timeout, curl could hang if Qdrant becomes unresponsive. Fun fact: monitor_indexing.sh already uses jq for this! 🧩

💡 Optional improvement
-    count=$(curl -s "${auth[@]}" "${base_url}/collections/${QDRANT_COLLECTION}" | grep -o '"points_count":[0-9]*' | cut -d: -f2)
+    count=$(curl -s --max-time 5 "${auth[@]}" "${base_url}/collections/${QDRANT_COLLECTION}" | jq -r '.result.points_count // 0' 2>/dev/null)

This adds a 5-second timeout and uses jq for cleaner JSON extraction (with a fallback to 0 if parsing fails).

@railway-app railway-app bot temporarily deployed to insightful-intuition / production January 26, 2026 04:14 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/development.md`:
- Around line 209-212: The java version check pipes nothing because `java
-version` writes to stderr; update the command shown (the `java -version | grep
-o '"[^"]*"'` example) to capture stderr by redirecting it to stdout, e.g. `java
-version 2>&1 | grep -o '"[^"]*"'`, so the grep receives the version string;
replace the existing example command accordingly.
♻️ Duplicate comments (1)
Makefile (1)

57-65: Pass OPENAI_API_KEY into APP_ARGS in run.
Tiny gotcha: validation allows OPENAI_API_KEY, but APP_ARGS only populates when GITHUB_TOKEN is set—OpenAI‑only users won’t get credentials at runtime.

🐛 Proposed fix
   APP_ARGS=(--server.port=$$SERVER_PORT); \
   if [ -n "$$GITHUB_TOKEN" ]; then \
     APP_ARGS+=( \
       --spring.ai.openai.api-key="$$GITHUB_TOKEN" \
       --spring.ai.openai.base-url="$${GITHUB_MODELS_BASE_URL:-https://models.github.ai/inference}" \
       --spring.ai.openai.chat.options.model="$${GITHUB_MODELS_CHAT_MODEL:-gpt-5}" \
       --spring.ai.openai.embedding.options.model="$${GITHUB_MODELS_EMBED_MODEL:-text-embedding-3-small}" \
     ); \
+  elif [ -n "$$OPENAI_API_KEY" ]; then \
+    APP_ARGS+=( \
+      --spring.ai.openai.api-key="$$OPENAI_API_KEY" \
+    ); \
   fi; \
🧹 Nitpick comments (1)
Makefile (1)

24-27: Optional: add an all target to satisfy checkmake.
Small tooling nicety—checkmake expects an all phony target. Adding a lightweight alias keeps linters happy without changing behavior.

✨ Suggested tweak
-.PHONY: help clean build test lint format run dev dev-backend compose-up compose-down compose-logs compose-ps health ingest citations fetch-all process-all full-pipeline frontend-install frontend-build
+.PHONY: all help clean build test lint format run dev dev-backend compose-up compose-down compose-logs compose-ps health ingest citations fetch-all process-all full-pipeline frontend-install frontend-build
+
+all: help ## Default target (alias)

make run validates OPENAI_API_KEY but never forwards it to Spring AI,
so OpenAI-only users start without credentials. This adds a dedicated branch
to append the key when GitHub Models isn’t configured, matching dev targets.

- Add OPENAI_API_KEY fallback branch to APP_ARGS in run
- Keep GitHub Models config as the preferred path when set
checkmake expects an all target and flags the Makefile without one. This
adds a lightweight alias to help without changing behavior.

- Add all target as default alias for help
- Include all in .PHONY
The troubleshooting example pipes java -version to grep, but the command
writes to stderr. Redirecting stderr ensures the version string is captured.

- Redirect stderr to stdout in the version check example
String contains checks are brittle and couple tests to raw HTML formatting.
Parsing the HTML and using quoted attribute selectors makes metadata checks
resilient while correctly matching Open Graph values that include colons.

- Parse HTML with Jsoup from MockMvc response
- Use quoted attribute selectors for og:* meta tags
- Centralize response loading in a helper
The CI workflow still used the v4 upload-artifact action, which is older and
missing the latest maintenance updates. This aligns all artifact uploads on v6
without changing workflow behavior.

- Upgrade upload-artifact steps for test, SpotBugs, and PMD reports
@railway-app railway-app bot temporarily deployed to insightful-intuition / production January 26, 2026 05:54 Inactive
SeoControllerTest depends on classpath:/static/index.html, which is not tracked
and is missing in CI, causing the controller to return 500. Provide a minimal
test resource so MVC tests have a stable HTML template.

- Add minimal static index.html under src/test/resources/static
@railway-app railway-app bot temporarily deployed to insightful-intuition / production January 26, 2026 06:05 Inactive
Local hook setup wasn’t captured in the repo, which makes it easy to skip
formatting and frontend checks before commits. This adds a prek config and a
Makefile helper, plus docs so developers can install hooks consistently.

- Add local prek hooks for Spotless and frontend checks
- Add `make hooks` helper to install hooks
- Document hook setup in development guide
This is a formatting-only change to normalize line wrapping in the SEO controller
test so the assertions are on single lines where they fit.

- Collapse multi-line assertion wrapping for readability
We want formatting to be automatic on commit, not just checked. This switches
the Java hook to Spotless apply and documents the behavior so developers expect
local file updates.

- Change prek hook to run spotlessApply
- Clarify in dev docs that the hook auto-formats
@railway-app railway-app bot temporarily deployed to insightful-intuition / production January 26, 2026 06:54 Inactive
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@docs/development.md`:
- Around line 48-50: Update the example to use the Gradle wrapper instead of the
system Gradle: replace the shell command reference "gradle build" with
"./gradlew build" (leave "java -version" unchanged) so the docs recommend the
deterministic wrapper-based build command.
🧹 Nitpick comments (4)
.pre-commit-config.yaml (1)

1-13: Consider adding file filters to avoid running hooks unnecessarily 🏃‍♂️

Right now, both hooks will run on every commit—even if you're just updating README.md. Adding files or types filters can save precious seconds and keep your workflow snappy!

Here's a little tweak that tells pre-commit "only bother with these hooks when relevant files change":

✨ Suggested improvement with file filters
 repos:
   - repo: local
     hooks:
       - id: spotless-java
         name: Spotless Java (Gradle, auto-format)
         entry: ./gradlew spotlessApply --no-daemon
         language: system
         pass_filenames: false
+        files: \.java$
       - id: frontend-check
         name: Frontend check (Vite/Svelte)
         entry: bash -lc "cd frontend && npm run check"
         language: system
         pass_filenames: false
+        files: ^frontend/

Fun fact: The files field uses regex patterns, so \.java$ matches any file ending in .java, and ^frontend/ matches anything under the frontend/ directory. Pre-commit is smart enough to skip the hook entirely if no matching files are staged! 🎯

src/test/java/com/williamcallahan/javachat/web/SeoControllerTest.java (1)

36-43: Nice refactor to DOM-based assertions! 🎯

This is a lovely improvement over string-contains checks — Jsoup parsing makes your assertions more robust and readable. One small tidbit: this test inlines the parsing logic while serves_chat_with_specific_metadata and serves_guided_with_specific_metadata use loadSeoDocument(). You could simplify this test to use the helper too, which would also make it consistent with its siblings.

♻️ Optional: Use the helper for consistency
     `@Test`
     void serves_root_with_seo_metadata() throws Exception {
-        MvcResult mvcOutcome = mvc.perform(get("/"))
-                .andExpect(status().isOk())
-                .andExpect(content().contentTypeCompatibleWith(MediaType.TEXT_HTML))
-                .andReturn();
-        Document htmlDocument = Jsoup.parse(mvcOutcome.getResponse().getContentAsString());
+        Document htmlDocument = loadSeoDocument("/");

         assertEquals("Java Chat - AI-Powered Java Learning With Citations", htmlDocument.title());
         assertMetaContent(htmlDocument, "property", "og:url", "https://example.com");
     }
Makefile (2)

48-49: Consider a friendly existence check for prek.

The hooks target assumes prek is already installed. If someone runs make hooks without it, they'll get a "command not found" error. A quick check could save a bit of head-scratching! 🧩

✨ Proposed improvement
 hooks: ## Install git hooks via prek
-	prek install --install-hooks
+	`@command` -v prek >/dev/null 2>&1 || { echo "Error: 'prek' not found. Install it first: https://github.com/caarlos0/prek" >&2; exit 1; }
+	prek install --install-hooks

51-78: A note on target body length (checkmake warning).

The static analyzer flags run and dev for exceeding its 5-line body limit. While extracting this into a shell script could shrink the targets, the current inline approach keeps everything visible in one place—useful for understanding the full startup flow at a glance. Totally fine to leave as-is unless you're aiming for strict checkmake compliance.

- Fix docs to reference ./gradlew instead of system gradle
- Add prek existence check with install link in make hooks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request refactor Code refactoring

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants