-
Notifications
You must be signed in to change notification settings - Fork 0
Add per-lesson chat isolation, result classification, dynamic prompts #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…g behavior ChatService contained 30+ lines of sequential boolean checks to categorize search result quality and generate LLM context messages. This procedural logic violated Tell-Don't-Ask by interrogating document properties instead of letting a domain concept describe itself. The new enum encapsulates both categorization logic (determine) and message formatting (formatMessage), eliminating the boolean chain and making quality levels explicit domain vocabulary. The static describeQuality convenience method maintains the existing API contract while delegating to the new polymorphic design. Also removes deprecated streamResponse(String, double) method from OpenAIStreamingService as all callers now use the StructuredPrompt-based overload. - Add SearchQualityLevel enum with NONE, KEYWORD_SEARCH, HIGH_QUALITY, MIXED_QUALITY - Each level owns its message template and formatting logic - Static determine() replaces procedural if-else chain in ChatService - describeQuality() provides drop-in replacement for existing callers - Remove deprecated OpenAIStreamingService.streamResponse(String, double)
…-aware guidance GuidedLearningService had grown to include 80+ lines of PDF pagination logic (page counting, chunk enumeration, anchor calculation) unrelated to lesson orchestration, plus it lacked awareness of the current lesson context when generating LLM guidance. This commit addresses both issues: 1. Extraction: The new PdfCitationEnhancer component owns the entire citation enhancement workflow (loading PDF for page count, counting chunks, parsing metadata, estimating pages). This removes LocalStoreService from GuidedLearningService and improves testability. 2. Lesson Focus: New buildLessonGuidance() and buildLessonContextDescription() methods construct LLM prompts that include the lesson title, summary, and keywords. The guidance template now includes topic handling rules to redirect greetings and off-topic questions back to the current lesson. - Extract PdfCitationEnhancer to support/ with proper Javadocs - Add THINK_JAVA_GUIDANCE_TEMPLATE with %s placeholder for lesson context - Inject SystemPromptConfig for guided learning mode instructions - Update streamLessonChat and buildLessonPrompt to use lesson-aware guidance
…ion bleeding A single session ID was shared across all lessons in the frontend, causing conversation history from one lesson to bleed into another. For example, asking about loops after studying variables would reference unrelated variable discussion from the shared history. The frontend now maintains a Map<lessonSlug, sessionId> so each lesson gets an isolated backend conversation. The GuidedStreamRequest accessor methods now return Optional<String> instead of defaulting to empty strings, forcing callers to handle missing values explicitly. Frontend: - Replace single sessionId with sessionIdsByLesson Map - Add getSessionIdForLesson() to create/retrieve per-lesson sessions - Use lesson-scoped session ID in streamGuidedChat calls Backend: - Change GuidedStreamRequest.userQuery() to return Optional<String> - Change GuidedStreamRequest.lessonSlug() to return Optional<String> - Filter out blank values in Optional accessors
…tional API Adds test coverage for the SearchQualityLevel enum extracted in commit 6095599, verifying all quality level determinations and message formatting. Also updates GuidedLearningController to use orElseThrow() on the new Optional<String> return types from GuidedStreamRequest, failing fast with clear error messages when required fields are missing. Tests cover: - NONE returned for null/empty document lists - KEYWORD_SEARCH detected from URL metadata patterns - HIGH_QUALITY when all documents have substantial content (>100 chars) - MIXED_QUALITY when some documents have short content - formatMessage() produces correct strings for each level - describeQuality() convenience method integration Controller: - Use orElseThrow() on userQuery() with "User query is required" message - Use orElseThrow() on lessonSlug() with "Lesson slug is required" message
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. 📝 WalkthroughSummary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings. WalkthroughPer-lesson guided chat and citation streaming implemented: frontend now uses lesson-scoped sessions and per-message IDs with streaming/citation attachment; backend adds guided prompt outcomes, PDF page-anchor enrichment, retrieval-quality classification, SSE abort/validation handling, tests, docs, and toolchain/tooling updates. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User (Browser)
participant FE as Frontend (LearnView / GuidedPanel)
participant FEcli as FE Stream Client
participant BE as Backend (GuidedLearningController)
participant GLS as GuidedLearningService
participant LLM as OpenAI Streaming
participant Store as Document Store
User->>FE: select lesson, type & send query
FE->>FEcli: create lessonSessionId & user messageId
FEcli->>BE: POST /api/guided/stream (sessionId, slug, messageId)
BE->>GLS: buildStructuredGuidedPromptWithContext(history, slug, query)
GLS->>Store: load grounding documents
GLS->>GLS: enhance citations (PdfCitationEnhancer)
GLS-->>BE: GuidedChatPromptOutcome (prompt + docs)
BE->>LLM: start streaming structured prompt
LLM-->>BE: token chunk events
BE-->>FEcli: SSE data events (chunks)
FEcli->>FE: append chunks into assistant message (messageId)
GLS-->>BE: citation payload
BE-->>FEcli: SSE citation event
FEcli->>FE: attach citations to the same assistant message
LLM-->>BE: [DONE]
FE->>User: render completed assistant message with citations
sequenceDiagram
participant Ingest as Ingestion pipeline
participant Store as LocalStoreService
participant Enhance as PdfCitationEnhancer
participant GLS as GuidedLearningService
Ingest->>Store: save PDF chunks with metadata (chunkIndex)
Store-->>Enhance: list chunk files → totalChunks
Enhance->>Enhance: compute page estimate from chunkIndex/totalChunks
Enhance-->>GLS: citations with '#page' anchors
GLS-->>BE: include page-anchored citations in GuidedChatPromptOutcome
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR enhances the guided learning experience by isolating chat sessions per lesson, enriching LLM guidance with lesson context and standardized search quality signals, and modularizing PDF citation handling.
Changes:
- Introduces
SearchQualityLeveland associated tests to centralize how search quality is categorized and described to the LLM, and wires it intoChatService. - Refactors guided learning backend to build lesson-aware guidance prompts via
SystemPromptConfig, delegate PDF citation page anchoring toPdfCitationEnhancer, and tighten validation of guided streaming requests. - Updates the guided learning frontend to use per-lesson session IDs when streaming chat, ensuring conversation isolation across lessons.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java | Adds an enum that classifies search result sets and produces consistent, self-describing quality messages consumed by the chat pipeline. |
| src/test/java/com/williamcallahan/javachat/domain/SearchQualityLevelTest.java | Verifies SearchQualityLevel.determine, formatMessage, and describeQuality behavior across empty, keyword, high-quality, and mixed-quality scenarios. |
| src/main/java/com/williamcallahan/javachat/web/GuidedStreamRequest.java | Changes userQuery and lessonSlug accessors to Optional, encouraging explicit handling of missing/blank values for guided streaming requests. |
| src/main/java/com/williamcallahan/javachat/web/GuidedLearningController.java | Uses the new Optional accessors to enforce required user query and lesson slug for guided chat, and continues to build structured prompts for streaming. |
| src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java | New component that computes and caches Think Java PDF page counts, estimates page numbers from chunk indices, and appends #page=N anchors and anchors fields to PDF citations. |
| src/main/java/com/williamcallahan/javachat/service/OpenAIStreamingService.java | Removes the deprecated string-based streaming API, leaving a single structured-prompt-based streamResponse entry point used by controllers and services. |
| src/main/java/com/williamcallahan/javachat/service/GuidedLearningService.java | Wires in SystemPromptConfig and PdfCitationEnhancer, uses lesson metadata to build targeted guidance strings, and delegates PDF citation enhancement out of the service. |
| src/main/java/com/williamcallahan/javachat/service/ChatService.java | Delegates search quality description to SearchQualityLevel.describeQuality, simplifying logic and keeping messaging consistent with system prompt behavior. |
| frontend/src/lib/components/LearnView.svelte | Replaces a single guided session ID with a per-lesson session ID map and passes the appropriate ID into streamGuidedChat, preventing cross-lesson conversation bleed. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
frontend/src/lib/components/LearnView.svelte (1)
213-227: Use the captured slug when starting the stream.Because
doScrollToBottom()is awaited,selectedLessoncan change before the request fires, pairing an old session ID with a new slug. UsingstreamLessonSlugkeeps session isolation intact. Tiny tip: captured values are your safest anchors in async flows.🔧 Suggested fix
- await streamGuidedChat(lessonSessionId, selectedLesson.slug, userQuery, { + await streamGuidedChat(lessonSessionId, streamLessonSlug, userQuery, {
🤖 Fix all issues with AI agents
In `@src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java`:
- Around line 1-5: SearchQualityLevel currently depends on
org.springframework.ai.document.Document, duplicates the "100" magic literal and
repeats the same high-quality filtering logic; remove the framework import and
refactor by introducing a domain-facing interface (e.g., RetrievedContent with
String getText() and boolean isHighQuality()) and use that in SearchQualityLevel
instead of Document, extract the numeric literal into a named constant (e.g.,
HIGH_QUALITY_LENGTH_THRESHOLD = 100) and replace both occurrences with it, and
move the duplicated filter/count logic into a single helper method (e.g.,
countHighQuality(List<RetrievedContent>) or RetrievedContent.isHighQuality())
inside the enum or a small domain util so both code paths call the same method;
map Document to RetrievedContent in an adapter layer outside the domain.
In `@src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java`:
- Around line 88-95: Reword the Javadoc opener for the PdfCitationEnhancer
method that returns the Think Java PDF page count (the method currently
documented with "Gets the total page count for the Think Java PDF.") to use a
direct verb or noun phrase instead of "Gets the…"; update the first sentence to
something like "Return the total page count for the Think Java PDF." or "Total
page count of the Think Java PDF." while keeping the rest of the Javadoc and
tags unchanged.
- Around line 45-55: Update the PdfCitationEnhancer Javadoc for the public
method enhanceWithPageAnchors to explicitly document that it may throw an
UncheckedIOException when PDF loading or chunk listing fails; add an `@throws`
UncheckedIOException line describing that the exception is raised if underlying
I/O operations (e.g., opening the PDF, reading chunks, or listing chunk
metadata) fail during page-anchor estimation so callers are aware to handle
runtime I/O errors.
- Around line 56-59: In PdfCitationEnhancer.enhanceWithPageAnchors, don't
silently return the original citations when docs.size() != citations.size();
instead surface the mismatch by failing fast — either throw an
IllegalArgumentException with a clear message including both sizes, or at
minimum log a warning with those sizes and then throw; update the method to
validate the contract up front so callers immediately see the mismatch rather
than getting back unmodified data.
In
`@src/main/java/com/williamcallahan/javachat/web/GuidedLearningController.java`:
- Around line 187-190: GuidedLearningController currently throws
IllegalArgumentException for missing fields but lacks an `@ExceptionHandler`,
causing 500 responses; add a method annotated with
`@ExceptionHandler`(IllegalArgumentException.class) in GuidedLearningController
that delegates to the shared super.handleValidationException(...) exactly like
IngestionController does so IllegalArgumentException is mapped to a 400
response; reference the controller class name GuidedLearningController and the
existing superclass method handleValidationException to locate where to add the
handler.
🧹 Nitpick comments (6)
src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java (1)
68-83: Extract PDF-related literals into constants.Strings like
#page=, and the anchor key are repeated and should be named constants for clarity and consistency. As per coding guidelines, avoid magic literals.♻️ Example refactor
+ private static final String PDF_EXTENSION = ".pdf"; + private static final String PAGE_ANCHOR_PREFIX = "#page="; + private static final String PAGE_ANCHOR_VALUE_PREFIX = "page="; @@ - if (url == null || !url.toLowerCase(Locale.ROOT).endsWith(".pdf")) { + if (url == null || !url.toLowerCase(Locale.ROOT).endsWith(PDF_EXTENSION)) { continue; } @@ - String withAnchor = url.contains("#page=") ? url : url + "#page=" + page; + String withAnchor = url.contains(PAGE_ANCHOR_PREFIX) ? url : url + PAGE_ANCHOR_PREFIX + page; citation.setUrl(withAnchor); - citation.setAnchor("page=" + page); + citation.setAnchor(PAGE_ANCHOR_VALUE_PREFIX + page);src/main/java/com/williamcallahan/javachat/service/GuidedLearningService.java (2)
71-77: Constructor parameter list is getting long.Now at 7 parameters; consider a parameter object or grouping related dependencies to keep injection tidy. Tiny tip: fewer constructor params makes wiring and tests happier. As per coding guidelines, avoid >4 positional parameters.
323-339: Extract lesson-label strings into named constants.Inline labels like “Lesson Summary:” and the default “No specific lesson selected…” are magic literals; constants improve consistency and reuse. Tiny tip: names turn strings into domain vocabulary. As per coding guidelines, avoid magic literals.
♻️ Example extraction
+ private static final String NO_LESSON_CONTEXT_MESSAGE = + "No specific lesson selected. Provide general Java learning assistance."; + private static final String LESSON_SUMMARY_LABEL = "Lesson Summary: "; + private static final String KEY_CONCEPTS_LABEL = "Key concepts to cover: "; @@ - return "No specific lesson selected. Provide general Java learning assistance."; + return NO_LESSON_CONTEXT_MESSAGE; @@ - contextBuilder.append("\n\nLesson Summary: ").append(lesson.getSummary()); + contextBuilder.append("\n\n").append(LESSON_SUMMARY_LABEL).append(lesson.getSummary()); @@ - contextBuilder.append("\n\nKey concepts to cover: ") + contextBuilder.append("\n\n").append(KEY_CONCEPTS_LABEL) .append(String.join(", ", lesson.getKeywords()));src/main/java/com/williamcallahan/javachat/web/GuidedStreamRequest.java (1)
28-45: Tighten accessor docs and consider value types.Style-wise, drop “Returns the…” in the opener; also consider introducing a validated
LessonSlug/UserQueryvalue type so these accessors returnOptional<LessonSlug>etc. Tiny tip: small domain types are great bug repellant. As per coding guidelines, avoid filler Javadoc phrases and raw primitives at API boundaries.✍️ Suggested Javadoc wording
- * Returns the user query when present and non-blank. + * Provides the user query when present and non-blank. @@ - * Returns the lesson slug when present and non-blank. + * Provides the lesson slug when present and non-blank.src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java (2)
77-83: Extract magic literal 100 to a named constant.The threshold
100appears here and again indescribeQuality(). A named constant likeSUBSTANTIAL_CONTENT_LENGTH_THRESHOLDmakes the intent crystal clear and ensures consistency if you ever need to tune it. Plus, future-you will thank present-you when debugging! 📚✨ Suggested improvement
public enum SearchQualityLevel { + /** + * Minimum content length (in characters) to consider a document substantial. + */ + private static final int SUBSTANTIAL_CONTENT_THRESHOLD = 100; + // ... enum values ... // In determine(): - return content != null && content.length() > 100; + return content != null && content.length() > SUBSTANTIAL_CONTENT_THRESHOLD; // In describeQuality(): - .filter(doc -> doc.getText() != null && doc.getText().length() > 100) + .filter(doc -> doc.getText() != null && doc.getText().length() > SUBSTANTIAL_CONTENT_THRESHOLD)
98-108: Duplicate high-quality filtering logic could be consolidated.The same
doc.getText() != null && doc.getText().length() > 100predicate appears in bothdetermine()anddescribeQuality(). Consider extracting a small helper predicate—it keeps things DRY and makes future threshold adjustments safer. Think of it as giving that logic a cozy home of its own! 🏠✨ Suggested consolidation
+ private static boolean isSubstantialContent(Document doc) { + String content = doc.getText(); + return content != null && content.length() > SUBSTANTIAL_CONTENT_THRESHOLD; + } + public static SearchQualityLevel determine(List<Document> docs) { // ... long highQualityCount = docs.stream() - .filter(doc -> { - String content = doc.getText(); - return content != null && content.length() > 100; - }) + .filter(SearchQualityLevel::isSubstantialContent) .count(); // ... } public static String describeQuality(List<Document> docs) { SearchQualityLevel level = determine(docs); int totalCount = docs != null ? docs.size() : 0; long highQualityCount = docs != null - ? docs.stream() - .filter(doc -> doc.getText() != null && doc.getText().length() > 100) - .count() + ? docs.stream().filter(SearchQualityLevel::isSubstantialContent).count() : 0; return level.formatMessage(totalCount, (int) highQualityCount); }
src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java
Show resolved
Hide resolved
src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java
Show resolved
Hide resolved
src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java
Show resolved
Hide resolved
src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java
Show resolved
Hide resolved
src/main/java/com/williamcallahan/javachat/web/GuidedLearningController.java
Outdated
Show resolved
Hide resolved
The original README (474 lines) contained everything from quick-start to API reference, configuration, architecture, mobile notes, and troubleshooting. This made it hard to navigate, maintain, and onboard contributors. The refactoring extracts each concern into its own file under docs/ while leaving a concise README as the project landing page. New documentation structure: - docs/README.md: table of contents and entry point - docs/getting-started.md: prerequisites, quick-start, common commands - docs/configuration.md: env vars for LLM, embeddings, Qdrant, RAG tuning - docs/api.md: HTTP endpoints (SSE streaming, guided learning, ingestion) - docs/ingestion.md: fetch/process/dedupe pipeline for RAG indexing - docs/architecture.md: high-level components and request flow Root README now: project summary, feature highlights, quick-start snippet, and a pointer to docs/README.md for comprehensive documentation.
…licate high-quality counting
…tch, reword 'Gets the' opener
…n 400 instead of 500
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 94f27592ed
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java
Show resolved
Hide resolved
src/main/java/com/williamcallahan/javachat/web/GuidedLearningController.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI agents
In `@docs/configuration.md`:
- Around line 18-19: Update the sentence "If both keys are present, the service
prefers OpenAI for streaming and can fall back to GitHub Models based on
rate-limit/backoff state." to remove any implication of silent provider
switching: document that falling back requires an explicit opt-in configuration
(e.g., a named config flag like ENABLE_PROVIDER_FALLBACK or
prefer_openai_with_fallback) and otherwise the service surfaces a visible error
when the preferred provider fails; also add a short note describing the
user-visible error path and how to enable fallback so readers know how to
control model attribution and debugging.
In `@README.md`:
- Around line 7-12: Update the README's "Documentation ingestion pipeline (fetch
→ chunk → embed → dedupe → index)" bullet to mention that chunking uses
JTokkit's CL100K_BASE tokenizer (GPT‑3.5/4 style) and briefly note how that
affects token counts and overlap behavior; specifically, add a short sentence
after the pipeline bullet explaining "chunks are created using the CL100K_BASE
tokenizer from JTokkit (GPT‑3.5/4 tokenization) so token counts and overlap
should be estimated with that tokenizer" and optionally a one-line tip about
estimating tokens for typical text to predict chunk sizes and overlaps.
In `@src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java`:
- Around line 64-86: The code uses a global thinkJavaPages (from
getThinkJavaPdfPages()) for every PDF which can mis-anchor non-Think‑Java PDFs;
restrict anchor logic to only the Think Java PDF by checking the URL before
applying page anchors (e.g., compare url to the known Think Java PDF URL or
detect a Think‑Java identifier) and only then call estimatePage and set
citation.setUrl / citation.setAnchor; alternatively compute per-URL page counts
instead of using thinkJavaPages if you need anchors for other PDFs. Ensure this
check is added around the block that uses thinkJavaPages, leaving
parseChunkIndex, countChunksForUrl and estimatePage untouched.
src/main/java/com/williamcallahan/javachat/support/PdfCitationEnhancer.java
Show resolved
Hide resolved
…GuidedLearningController
… of instantiating RestTemplate
…degradation The application was catching IOException during TOC loading and returning an empty list. This caused the guided learning feature to silently fail (showing empty lessons) if the resource was missing or corrupt, masking the root cause. Throwing a runtime exception ensures the application fails fast during startup or first access, alerting developers to the configuration issue immediately. - Throw IllegalStateException instead of logging and returning empty list - Wraps the original IOException for debugging context - Addresses Clean Code "Silent Degradation" (BLK2) violation
Recent refactorings introduced minor formatting inconsistencies (line wrapping, whitespace). These changes align the codebase with the project's Spotless/Palantir formatting rules. - Fix line wrapping in SearchQualityLevel and EmbeddingCacheService - Fix conditional formatting in test files - Normalize whitespace in extracted constants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In `@docs/development.md`:
- Around line 221-222: Update the typo in the References section by replacing
the incorrect URL string "https://mise.jdnow.dev/" (attached to the link text
"[mise version manager]") with the correct official site
"https://mise.jdx.dev/"; locate the link entry for "mise version manager" in
docs/development.md and change only the hostname to jdx.dev so the markdown link
becomes the valid canonical URL.
In `@docs/getting-started.md`:
- Line 24: Update the incorrect mise URL in docs/getting-started.md: replace the
referenced URL "https://mise.jdnow.dev/" with the correct official URL
"https://mise.jdx.dev/". Ensure any occurrences of the old URL (search for
"https://mise.jdnow.dev/") are updated to the new one and save the file.
In `@Makefile`:
- Around line 118-126: The dev-backend target only adds APP_ARGS for
GITHUB_TOKEN but not for OPENAI_API_KEY; add an elif branch after the existing
if [ -n "$$GITHUB_TOKEN" ] that checks [ -n "$$OPENAI_API_KEY" ] and appends the
corresponding spring.ai.openai.* flags to APP_ARGS (at minimum
--spring.ai.openai.api-key from $$OPENAI_API_KEY and the base-url and model
flags using the OPENAI_* env vars or sensible defaults). Update the APP_ARGS
construction in the dev-backend target (the APP_ARGS variable) to mirror the
run/dev logic so OpenAI-only users are supported.
In `@src/main/java/com/williamcallahan/javachat/service/GuidedTOCProvider.java`:
- Around line 50-54: The finally block sets tocLoaded = true even on failure,
causing subsequent calls to silently return cached/empty results; update
GuidedTOCProvider so tocLoaded is only set when the TOC load succeeds (e.g.,
move the tocLoaded = true assignment into the successful path inside the try
after the cache is populated or after the code that processes the InputStream
completes) and leave it false when an IOException is thrown so callers will
retry loading instead of getting a silent empty result.
♻️ Duplicate comments (3)
src/test/java/com/williamcallahan/javachat/ExtractorQualityTest.java (2)
113-117: Path-separator portability fix confirmed! ✅The dual check for both forward slashes (
/api/) and backslashes (\\api\\) ensures this utility works correctly on both Unix-like systems and Windows. Fun fact: Java'sPath.toString()uses the platform's native separator, so this cross-platform check is the right call.
146-151: Division-by-zero guard in place! 🛡️The
oldText.length() > 0check preventsInfinityorNaNfrom sneaking into your logs when encountering empty documents. Clean defensive coding.Makefile (1)
57-65: Theruntarget still needs OPENAI_API_KEY handling in APP_ARGS.Hey there! 👋 Quick heads up—while the validation at lines 48-51 accepts either
GITHUB_TOKENorOPENAI_API_KEY, theAPP_ARGSblock here only populates credentials whenGITHUB_TOKENis set.If a user configures only
OPENAI_API_KEY, they'll pass the gate but the app won't receive any API key—leading to a confusing runtime failure. Thedevtarget (lines 92-95) already has the fix pattern in place!🐛 Proposed fix
APP_ARGS=(--server.port=$$SERVER_PORT); \ if [ -n "$$GITHUB_TOKEN" ]; then \ APP_ARGS+=( \ --spring.ai.openai.api-key="$$GITHUB_TOKEN" \ --spring.ai.openai.base-url="$${GITHUB_MODELS_BASE_URL:-https://models.github.ai/inference}" \ --spring.ai.openai.chat.options.model="$${GITHUB_MODELS_CHAT_MODEL:-gpt-5}" \ --spring.ai.openai.embedding.options.model="$${GITHUB_MODELS_EMBED_MODEL:-text-embedding-3-small}" \ ); \ + elif [ -n "$$OPENAI_API_KEY" ]; then \ + APP_ARGS+=( \ + --spring.ai.openai.api-key="$$OPENAI_API_KEY" \ + ); \ fi; \
🧹 Nitpick comments (4)
src/test/java/com/williamcallahan/javachat/ExtractorQualityTest.java (1)
119-128: Consider extracting noise patterns to a constant (optional nitpick).For a manual utility runner, inline patterns are totally fine. But if you ever expand this list or want to reuse it elsewhere, pulling it into a
private static final String[] NOISE_PATTERNSat the class level could keep things neat. Low priority—just a thought for future-you! 🧹src/main/java/com/williamcallahan/javachat/domain/SearchQualityLevel.java (1)
110-122: Minor optimization opportunity:countHighQualityis called twice.When
describeQualitycallsdetermine(), the high-quality count is computed internally, then computed again on line 119. For typical search result sizes this is negligible, but if you're feeling tidy, you could refactor to avoid the duplicate work.♻️ Optional refactor to compute count once
One approach is to have
determinereturn a small result object (or use a private helper struct):+ private record QualityAssessment(SearchQualityLevel level, int totalCount, int highQualityCount) {} + + private static QualityAssessment assess(List<? extends RetrievedContent> contents) { + if (contents == null || contents.isEmpty()) { + return new QualityAssessment(NONE, 0, 0); + } + int total = contents.size(); + int highQuality = (int) countHighQuality(contents); + + boolean likelyKeywordSearch = contents.stream().anyMatch(content -> content.getSourceUrl() + .filter(url -> url.contains(KEYWORD_MARKER_LOCAL_SEARCH) || url.contains(KEYWORD_MARKER_KEYWORD)) + .isPresent()); + + if (likelyKeywordSearch) { + return new QualityAssessment(KEYWORD_SEARCH, total, highQuality); + } + if (highQuality == total) { + return new QualityAssessment(HIGH_QUALITY, total, highQuality); + } + return new QualityAssessment(MIXED_QUALITY, total, highQuality); + } + public static String describeQuality(List<? extends RetrievedContent> contents) { - SearchQualityLevel level = determine(contents); - int totalCount = contents != null ? contents.size() : 0; - long highQualityCount = countHighQuality(contents); - - return level.formatMessage(totalCount, (int) highQualityCount); + QualityAssessment assessment = assess(contents); + return assessment.level().formatMessage(assessment.totalCount(), assessment.highQualityCount()); }This keeps
determine()unchanged for external callers, whiledescribeQualityuses the richer internal helper.Makefile (1)
24-24: Optional: Consider adding analltarget.The static analysis tool notes a missing
alltarget—this is a common Makefile convention whereallserves as the default build target. Since you havehelpas the first target, users already get a sensible default.If you'd like to follow the convention, you could add:
all: build ## Default target: build the projectThis is purely a "nice to have" for Makefile purists! 📚
scripts/monitor_progress.sh (1)
22-22: Consider adding a timeout and usingjqfor more robust JSON parsing.The current grep/cut approach works, but it's a bit fragile if the JSON structure changes. Also, without a timeout, curl could hang if Qdrant becomes unresponsive. Fun fact:
monitor_indexing.shalready usesjqfor this! 🧩💡 Optional improvement
- count=$(curl -s "${auth[@]}" "${base_url}/collections/${QDRANT_COLLECTION}" | grep -o '"points_count":[0-9]*' | cut -d: -f2) + count=$(curl -s --max-time 5 "${auth[@]}" "${base_url}/collections/${QDRANT_COLLECTION}" | jq -r '.result.points_count // 0' 2>/dev/null)This adds a 5-second timeout and uses
jqfor cleaner JSON extraction (with a fallback to0if parsing fails).
src/main/java/com/williamcallahan/javachat/service/GuidedTOCProvider.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/development.md`:
- Around line 209-212: The java version check pipes nothing because `java
-version` writes to stderr; update the command shown (the `java -version | grep
-o '"[^"]*"'` example) to capture stderr by redirecting it to stdout, e.g. `java
-version 2>&1 | grep -o '"[^"]*"'`, so the grep receives the version string;
replace the existing example command accordingly.
♻️ Duplicate comments (1)
Makefile (1)
57-65: PassOPENAI_API_KEYintoAPP_ARGSinrun.
Tiny gotcha: validation allowsOPENAI_API_KEY, butAPP_ARGSonly populates whenGITHUB_TOKENis set—OpenAI‑only users won’t get credentials at runtime.🐛 Proposed fix
APP_ARGS=(--server.port=$$SERVER_PORT); \ if [ -n "$$GITHUB_TOKEN" ]; then \ APP_ARGS+=( \ --spring.ai.openai.api-key="$$GITHUB_TOKEN" \ --spring.ai.openai.base-url="$${GITHUB_MODELS_BASE_URL:-https://models.github.ai/inference}" \ --spring.ai.openai.chat.options.model="$${GITHUB_MODELS_CHAT_MODEL:-gpt-5}" \ --spring.ai.openai.embedding.options.model="$${GITHUB_MODELS_EMBED_MODEL:-text-embedding-3-small}" \ ); \ + elif [ -n "$$OPENAI_API_KEY" ]; then \ + APP_ARGS+=( \ + --spring.ai.openai.api-key="$$OPENAI_API_KEY" \ + ); \ fi; \
🧹 Nitpick comments (1)
Makefile (1)
24-27: Optional: add analltarget to satisfy checkmake.
Small tooling nicety—checkmake expects anallphony target. Adding a lightweight alias keeps linters happy without changing behavior.✨ Suggested tweak
-.PHONY: help clean build test lint format run dev dev-backend compose-up compose-down compose-logs compose-ps health ingest citations fetch-all process-all full-pipeline frontend-install frontend-build +.PHONY: all help clean build test lint format run dev dev-backend compose-up compose-down compose-logs compose-ps health ingest citations fetch-all process-all full-pipeline frontend-install frontend-build + +all: help ## Default target (alias)
make run validates OPENAI_API_KEY but never forwards it to Spring AI, so OpenAI-only users start without credentials. This adds a dedicated branch to append the key when GitHub Models isn’t configured, matching dev targets. - Add OPENAI_API_KEY fallback branch to APP_ARGS in run - Keep GitHub Models config as the preferred path when set
checkmake expects an all target and flags the Makefile without one. This adds a lightweight alias to help without changing behavior. - Add all target as default alias for help - Include all in .PHONY
The troubleshooting example pipes java -version to grep, but the command writes to stderr. Redirecting stderr ensures the version string is captured. - Redirect stderr to stdout in the version check example
String contains checks are brittle and couple tests to raw HTML formatting. Parsing the HTML and using quoted attribute selectors makes metadata checks resilient while correctly matching Open Graph values that include colons. - Parse HTML with Jsoup from MockMvc response - Use quoted attribute selectors for og:* meta tags - Centralize response loading in a helper
The CI workflow still used the v4 upload-artifact action, which is older and missing the latest maintenance updates. This aligns all artifact uploads on v6 without changing workflow behavior. - Upgrade upload-artifact steps for test, SpotBugs, and PMD reports
SeoControllerTest depends on classpath:/static/index.html, which is not tracked and is missing in CI, causing the controller to return 500. Provide a minimal test resource so MVC tests have a stable HTML template. - Add minimal static index.html under src/test/resources/static
Local hook setup wasn’t captured in the repo, which makes it easy to skip formatting and frontend checks before commits. This adds a prek config and a Makefile helper, plus docs so developers can install hooks consistently. - Add local prek hooks for Spotless and frontend checks - Add `make hooks` helper to install hooks - Document hook setup in development guide
This is a formatting-only change to normalize line wrapping in the SEO controller test so the assertions are on single lines where they fit. - Collapse multi-line assertion wrapping for readability
We want formatting to be automatic on commit, not just checked. This switches the Java hook to Spotless apply and documents the behavior so developers expect local file updates. - Change prek hook to run spotlessApply - Clarify in dev docs that the hook auto-formats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@docs/development.md`:
- Around line 48-50: Update the example to use the Gradle wrapper instead of the
system Gradle: replace the shell command reference "gradle build" with
"./gradlew build" (leave "java -version" unchanged) so the docs recommend the
deterministic wrapper-based build command.
🧹 Nitpick comments (4)
.pre-commit-config.yaml (1)
1-13: Consider adding file filters to avoid running hooks unnecessarily 🏃♂️Right now, both hooks will run on every commit—even if you're just updating
README.md. Addingfilesortypesfilters can save precious seconds and keep your workflow snappy!Here's a little tweak that tells pre-commit "only bother with these hooks when relevant files change":
✨ Suggested improvement with file filters
repos: - repo: local hooks: - id: spotless-java name: Spotless Java (Gradle, auto-format) entry: ./gradlew spotlessApply --no-daemon language: system pass_filenames: false + files: \.java$ - id: frontend-check name: Frontend check (Vite/Svelte) entry: bash -lc "cd frontend && npm run check" language: system pass_filenames: false + files: ^frontend/Fun fact: The
filesfield uses regex patterns, so\.java$matches any file ending in.java, and^frontend/matches anything under thefrontend/directory. Pre-commit is smart enough to skip the hook entirely if no matching files are staged! 🎯src/test/java/com/williamcallahan/javachat/web/SeoControllerTest.java (1)
36-43: Nice refactor to DOM-based assertions! 🎯This is a lovely improvement over string-contains checks — Jsoup parsing makes your assertions more robust and readable. One small tidbit: this test inlines the parsing logic while
serves_chat_with_specific_metadataandserves_guided_with_specific_metadatauseloadSeoDocument(). You could simplify this test to use the helper too, which would also make it consistent with its siblings.♻️ Optional: Use the helper for consistency
`@Test` void serves_root_with_seo_metadata() throws Exception { - MvcResult mvcOutcome = mvc.perform(get("/")) - .andExpect(status().isOk()) - .andExpect(content().contentTypeCompatibleWith(MediaType.TEXT_HTML)) - .andReturn(); - Document htmlDocument = Jsoup.parse(mvcOutcome.getResponse().getContentAsString()); + Document htmlDocument = loadSeoDocument("/"); assertEquals("Java Chat - AI-Powered Java Learning With Citations", htmlDocument.title()); assertMetaContent(htmlDocument, "property", "og:url", "https://example.com"); }Makefile (2)
48-49: Consider a friendly existence check forprek.The
hookstarget assumesprekis already installed. If someone runsmake hookswithout it, they'll get a "command not found" error. A quick check could save a bit of head-scratching! 🧩✨ Proposed improvement
hooks: ## Install git hooks via prek - prek install --install-hooks + `@command` -v prek >/dev/null 2>&1 || { echo "Error: 'prek' not found. Install it first: https://github.com/caarlos0/prek" >&2; exit 1; } + prek install --install-hooks
51-78: A note on target body length (checkmake warning).The static analyzer flags
runanddevfor exceeding its 5-line body limit. While extracting this into a shell script could shrink the targets, the current inline approach keeps everything visible in one place—useful for understanding the full startup flow at a glance. Totally fine to leave as-is unless you're aiming for strict checkmake compliance.
- Fix docs to reference ./gradlew instead of system gradle - Add prek existence check with install link in make hooks
This pull request introduces significant improvements to both the frontend and backend of the guided learning feature, focusing on better session isolation per lesson, enhanced contextual guidance for LLM responses, and codebase modularization. The most important changes are grouped below:
Frontend: Guided Chat Session Isolation
LearnView.svelte, ensuring conversations do not bleed across lessons and improving backend context separation. (frontend/src/lib/components/LearnView.svelte) [1] [2] [3]Backend: Guidance and Prompt Improvements
GuidedLearningService.java) [1] [2] [3] [4]SystemPromptConfig. (GuidedLearningService.java) [1] [2] [3]Backend: Search Quality Abstraction
SearchQualityLevelenum to centralize and standardize how the quality of search results is described to the LLM, replacing scattered boolean checks with a self-describing, extensible approach. (SearchQualityLevel.java,ChatService.java) [1] [2] [3]Backend: PDF Citation Enhancement Refactor
PdfCitationEnhancersupport class, simplifyingGuidedLearningServiceand improving maintainability. (GuidedLearningService.java) [1] [2] [3]These changes collectively make guided learning chat more robust, context-aware, and maintainable.