Skip to content

feat: Phase 8 polish -- feedback endpoint, prompt tuning, and action bar#135

Merged
malston merged 15 commits intomainfrom
gsd/phase-08-polish
Mar 4, 2026
Merged

feat: Phase 8 polish -- feedback endpoint, prompt tuning, and action bar#135
malston merged 15 commits intomainfrom
gsd/phase-08-polish

Conversation

@malston
Copy link
Owner

@malston malston commented Mar 3, 2026

Summary

  • Feedback endpoint (POST /api/v1/chat/feedback): Auth-guarded endpoint that logs thumbs up/down ratings via structured slog for analytics without persisting state. Includes rate limiting and input validation.
  • Procurement-oriented prompt tuning: Expanded system prompt with urgency tier thresholds (70/80/90%) and budget justification framing (deployment failure risk, SLA exposure, developer velocity impact).
  • Action bar UI: Hover-reveal action bar on assistant messages with copy-to-clipboard (strips Markdown to plain text) and thumbs up/down feedback buttons. Always visible on mobile viewports.

Test plan

  • Backend: 11 new feedback handler tests (auth, validation, slog capture, truncation)
  • Backend: 3 new prompt tests (urgency tiers, budget justification, no calendar references)
  • Frontend: 24 new stripMarkdown utility tests
  • Frontend: 16 new action bar / feedback interaction tests
  • Full test suite green: all backend packages pass, 433 frontend tests pass
  • Manual verification: hover-reveal, copy plain text, feedback toggle, mobile layout

malston added 8 commits March 3, 2026 14:44
- Table-driven TestChatFeedback covering auth, validation, slog logging
- TestChatFeedback_TruncatesLongQuestion verifies server-side truncation
- Custom captureLogHandler for slog assertion
…and slog logging

- ChatFeedback handler validates auth, rating (up/down/none), message_index
- Server-side truncation of question to 100 chars (defense in depth)
- Logs feedback via slog.Info with username, rating, message_index, question
- Route registered at POST /api/v1/chat/feedback with "write" rate limit
…ripMarkdown

- stripMarkdown utility: headers, bold, italic, links, images, code, blockquotes, lists, tables, edge cases
- Action bar: visibility on assistant vs user messages, streaming exclusion, hover classes
- Copy button: strips markdown, writes to clipboard, checkmark feedback
- Feedback buttons: thumbs up/down toggle, aria-pressed, sendFeedback integration
- ChatMessages feedback integration: toggle logic, truncated question, sendFeedback calls
…tion

- TestSystemPromptContainsUrgencyTiers checks for 70/80/90% thresholds, expedite, burst capacity
- TestSystemPromptContainsBudgetJustification checks for deployment failure, SLA, developer velocity
- TestSystemPromptNoCalendarReferences ensures no Q1-Q4 or fiscal year references
… justification

- Added urgency tier guidance mapped to 70/80/90% utilization thresholds
- Added budget justification section: deployment failure risk, SLA exposure, developer velocity impact
- Uses relative timing throughout, no calendar-specific references
- System prompt stays under 10,000 character budget
- All existing and new tests pass
…ant messages

- stripMarkdown utility: regex-based Markdown-to-plain-text conversion
- sendFeedback API function: fire-and-forget POST to /api/v1/chat/feedback
- CopyButton: strips markdown, writes to clipboard, checkmark icon for 2s
- FeedbackButtons: thumbs up/down toggle with green/red active states
- Action bar: hover-reveal on desktop (md:group-hover), always visible on mobile
- Feedback state managed in ChatMessages, keyed by message index
- Resets feedback state when conversation is cleared
- SUMMARY.md with task commits, decisions, self-check
- STATE.md updated: phase 8 in progress, plan 1 of 2 complete
- ROADMAP.md updated with phase 08 progress
- REQUIREMENTS.md: POL-02 and POL-03 marked complete
- 08-02-SUMMARY.md created with task commits, decisions, and self-check
- STATE.md updated to 100% complete, all 17 plans done
- ROADMAP.md phase 8 plans marked complete
- REQUIREMENTS.md POL-01 marked complete
Copilot AI review requested due to automatic review settings March 3, 2026 22:22
@claude
Copy link

claude bot commented Mar 3, 2026

PR Review: Phase 8 Polish -- Feedback Endpoint, Prompt Tuning, and Action Bar

Overall this is a well-structured, well-tested PR. The feature decomposition is clean, the test coverage is solid, and the fire-and-forget feedback design is a sensible call for non-critical telemetry. A few items worth addressing before merge:


Issues

1. CopyButton: setTimeout leaks on unmount (medium)

CopyButton.handleCopy schedules setCopied(false) via setTimeout, but there is no cleanup if the component unmounts during the 2-second window. In React strict mode this will log a warning; in production it silently skips the update, but it is still a resource concern.

// frontend/src/components/chat/ChatMessage.jsx
const handleCopy = useCallback(async () => {
  try {
    await navigator.clipboard.writeText(stripMarkdown(content));
    setCopied(true);
    setTimeout(() => setCopied(false), 2000); // ← no cleanup
  } catch (err) { ... }
}, [content]);

Fix: track the timer in a useRef and clear it in a useEffect cleanup.

const timerRef = useRef(null);
useEffect(() => () => clearTimeout(timerRef.current), []);

const handleCopy = useCallback(async () => {
  try {
    await navigator.clipboard.writeText(stripMarkdown(content));
    setCopied(true);
    timerRef.current = setTimeout(() => setCopied(false), 2000);
  } catch (err) { ... }
}, [content]);

2. Prompt encourages cost estimates the LLM cannot verify (medium)

The new Budget Justification section instructs:

"N hosts at an estimated $X-Y per host" is more actionable than "N hosts"

The system has no pricing data. Asking the LLM to frame procurement in dollar terms will cause it to hallucinate prices — directly contradicting DOM-03 ("acknowledge data gaps rather than hallucinate"). Consider rephrasing to encourage the operator to fill in their own unit costs:

Frame procurement quantities so the operator can apply their own unit costs:
"N additional hosts" gives procurement a concrete quantity to price against their vendor contracts.
Avoid estimating dollar values — the system does not have pricing data.

3. sendFeedback swallows non-2xx HTTP errors silently (low)

fetch only rejects on network failures. A 429 or 401 from the feedback endpoint will be silently discarded. This is largely intentional (fire-and-forget), but a rate-limited user would never know their feedback was dropped. Consider at minimum a console.warn on non-OK responses:

const res = await fetch(...);
if (!res.ok) {
  console.warn(`Feedback submission returned ${res.status}`);
}

4. feedback route uses the generic "write" rate limit bucket (low)

// backend/handlers/routes.go
{Method: http.MethodPost, Path: "/api/v1/chat/feedback", Handler: h.ChatFeedback, RateLimit: "write"},

The "write" bucket presumably governs infrastructure state changes. Feedback is much lower-risk and could tolerate a more permissive limit, or conversely deserves its own limit to prevent log flooding via the feedback endpoint. A dedicated "feedback" rate limit key would make the intent explicit and allow independent tuning.


5. Ordered vs unordered list stripping is inconsistent (low)

stripMarkdown replaces unordered list markers with "- " (preserving visual structure) but strips ordered list markers entirely with no prefix:

.replace(/^\s*[-*+]\s+/gm, "- ")   // keeps "- " prefix
.replace(/^\s*\d+\.\s+/gm, "")     // strips marker, no prefix

For clipboard content this probably does not matter much, but it means 1. First\n2. Second becomes First\nSecond while - Item 1\n- Item 2 becomes - Item 1\n- Item 2. Probably fine for the use case, but worth a quick decision.


Positive Observations

  • Test coverage is excellent. captureLogHandler is a clever approach to assert on slog output without coupling tests to a specific log format. The 11 feedback tests and 24 stripMarkdown tests cover the edge cases thoroughly.
  • validRatings as a map for O(1) lookup is the right call.
  • Server-side truncation at 100 chars as defense-in-depth alongside the client-side truncation is a good belt-and-suspenders approach.
  • showActionBar = isAssistant && message.content && !isStreaming correctly prevents the action bar from appearing on streaming messages.
  • feedbackState reset on messages.length === 0 is the right place to hook the reset; correlating it to conversation clear is clean.
  • aria-pressed on feedback buttons — good accessibility practice.
  • Urgency tier tests (TestSystemPromptContainsUrgencyTiers, TestSystemPromptNoCalendarReferences) are a nice pattern for prompt regression testing.

Summary

Issues 1 and 2 are worth fixing before merge. Issues 3–5 are informational. Happy to approve once the setTimeout cleanup is addressed and the prompt cost-estimation language is reconsidered.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds “Phase 8” polish to the AI Advisor by introducing a stateless feedback signal path, tightening the procurement-oriented system prompt, and improving assistant-message UX with an action bar for copy + thumbs feedback.

Changes:

  • Backend: new auth-guarded POST /api/v1/chat/feedback endpoint that validates input and logs structured feedback via slog.
  • Backend: expanded procurement framing in the system prompt (urgency tiers + budget justification language) with new prompt-content tests.
  • Frontend: assistant message action bar (copy-to-clipboard using Markdown→text stripping + thumbs up/down toggle) and sendFeedback client helper, with new unit tests.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
frontend/src/utils/stripMarkdown.js Adds Markdown-to-plain-text utility used for clipboard copy.
frontend/src/utils/stripMarkdown.test.js Adds unit tests covering many Markdown stripping cases.
frontend/src/services/chatApi.js Adds fire-and-forget sendFeedback() POST to feedback endpoint.
frontend/src/components/chat/ChatMessage.jsx Adds assistant action bar UI (copy + feedback buttons).
frontend/src/components/chat/ChatMessages.jsx Tracks per-message feedback state and triggers sendFeedback() on toggles.
frontend/src/components/chat/ChatPanel.test.jsx Extends component tests for action bar visibility, copy behavior, and feedback interactions.
backend/handlers/feedback.go Implements ChatFeedback handler with validation + structured logging.
backend/handlers/feedback_test.go Adds handler tests for auth/validation/logging/truncation behaviors.
backend/handlers/routes.go Registers the new /api/v1/chat/feedback route with rate limit tier write.
backend/services/ai/prompt.go Extends procurement framing with urgency tiers and budget justification guidance.
backend/services/ai/prompt_test.go Adds tests to enforce new procurement prompt requirements and avoid calendar references.
.planning/phases/08-polish/08-01-SUMMARY.md Documents plan 08-01 completion (feedback endpoint + prompt tuning).
.planning/phases/08-polish/08-02-SUMMARY.md Documents plan 08-02 completion (action bar + stripMarkdown + sendFeedback).
.planning/STATE.md Records project milestone completion state.
.planning/ROADMAP.md Adds roadmap/phase documentation snapshot.
.planning/REQUIREMENTS.md Adds requirements documentation snapshot.
Comments suppressed due to low confidence (2)

backend/services/ai/prompt_test.go:137

  • The comment says “At least one of these business-impact terms must be present”, but the assertion requires all 3 terms (found < 3 fails). Update the comment to match the test’s actual requirement (or relax the assertion if “at least one” is intended).
	lower := strings.ToLower(systemPrompt)
	// At least one of these business-impact terms must be present
	businessTerms := []string{
		"deployment failure",
		"sla",
		"developer velocity",
	}
	found := 0
	for _, term := range businessTerms {
		if strings.Contains(lower, term) {
			found++
		}
	}
	if found < 3 {
		t.Errorf("system prompt should contain all budget justification terms (deployment failure, SLA, developer velocity), found %d of 3", found)

.planning/phases/08-polish/08-02-SUMMARY.md:10

  • This summary claims the feedback endpoint provides “rating persistence”, but the implementation is explicitly stateless (logs via slog without storing). Consider rewording to avoid implying persistence/storage.
requires:
  - phase: 08-polish-01
    provides: POST /api/v1/chat/feedback endpoint for thumbs up/down rating persistence
  - phase: 05-frontend-chat

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

malston added 2 commits March 3, 2026 15:39
- Fix setTimeout leak on unmount in CopyButton (useRef + cleanup)
- Remove unused `within` import from ChatPanel.test.jsx
- Use rune-based truncation for UTF-8 safety in feedback handler
- Remove hallucination-inducing price estimate language from prompt
- Fix misleading test comment (says "at least one" but requires all 3)
- Log non-2xx responses in sendFeedback for debuggability
- Remove unused withFeedbackTestAuth (staticcheck U1000), alias
  feedbackTestClaims to shared testClaims
- Replace useEffect setState with render-time state adjustment
  (react-hooks/set-state-in-effect)
- Add clipboard failure feedback with red icon and "Copy failed" label
- Fix clearTimeout on rapid copy clicks to prevent premature reset
- Move handleFeedback toggle logic inside functional updater to remove
  feedbackState dependency and reduce React.memo re-renders
- Capture parse error message in parseSSEEvent bare catch
- Add sendFeedback unit tests for CSRF, credentials, and fire-and-forget
@malston malston requested a review from Copilot March 3, 2026 22:54
@malston
Copy link
Owner Author

malston commented Mar 3, 2026

/claude-review

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +266 to 276
{showActionBar && (
<div
data-testid="action-bar"
className="absolute top-2 right-2 md:opacity-0 md:group-hover:opacity-100 transition-opacity flex items-center gap-0.5 bg-slate-800 border border-slate-600 rounded-md px-1 py-0.5"
>
<CopyButton content={message.content} />
<FeedbackButtons
rating={feedbackRating}
onRate={(rating) => onFeedback?.(rating)}
/>
</div>
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The action bar is hidden on desktop using md:opacity-0, but it remains in the tab order and accessible tree. Keyboard users can tab to invisible buttons, and screen readers may announce controls that aren't visibly available. Consider also toggling interactivity/visibility (e.g., md:pointer-events-none when hidden + md:group-hover:pointer-events-auto, and/or md:group-focus-within:opacity-100) or conditionally rendering the action bar only when visible/focused.

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added md:pointer-events-none when hidden, md:group-hover:pointer-events-auto and md:group-focus-within:opacity-100 + pointer-events-auto so keyboard users can tab into the action bar when it becomes visible via focus.

// Blockquotes
.replace(/^>\s?/gm, "")
// Unordered list markers
.replace(/^\s*[-*+]\s+/gm, "- ")
Copy link

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stripMarkdown claims to “strip unordered list markers”, but the regex replaces them with a literal "- " (so list markers still appear in the copied plain text). Either remove the marker entirely (replace with "") or update the doc/tests to reflect that bullets are intentionally preserved/normalized.

Suggested change
.replace(/^\s*[-*+]\s+/gm, "- ")
.replace(/^\s*[-*+]\s+/gm, "")

Copilot uses AI. Check for mistakes.
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Unordered list markers now stripped entirely for consistency with ordered list handling.

@claude
Copy link

claude bot commented Mar 3, 2026

Code Review: Phase 8 Polish — Feedback Endpoint, Prompt Tuning, Action Bar

Overall this is a well-structured PR with solid test coverage. The design is appropriate for a fire-and-forget analytics signal: no state persistence, auth-guarded, rate-limited, and input-validated. A few issues worth addressing below.


Bug: Side effect inside React state updater (ChatMessages.jsx)

The handleFeedback callback calls sendFeedback(...) inside the functional form of setFeedbackState. State updater functions must be pure — React 18 Strict Mode intentionally invokes them twice in development to catch exactly this class of bug, which would result in duplicate network requests per user action.

// Anti-pattern: network call inside a state updater
setFeedbackState(prev => {
  sendFeedback(...);            // <-- side effect here
  return { ...prev, [index]: newRating };
});

The fix is to compute newRating (and decide whether to call sendFeedback) before entering the updater, then call sendFeedback outside of it:

const handleFeedback = useCallback((messageIndex, rating) => {
  const prev = feedbackStateRef.current;           // or derive newRating from current state
  const current = prev[messageIndex];
  const newRating = current === rating ? "none" : rating;

  // Side effect outside the updater
  const precedingUser = [...messages].reverse().find(
    (m, i) => messages.length - 1 - i < messageIndex && m.role === "user"
  );
  const truncatedQuestion = precedingUser?.content?.slice(0, 100) ?? "";
  sendFeedback({ messageIndex, rating: newRating, truncatedQuestion });

  setFeedbackState(prev =>
    newRating === "none"
      ? Object.fromEntries(Object.entries(prev).filter(([k]) => k !== String(messageIndex)))
      : { ...prev, [messageIndex]: newRating }
  );
}, [messages]);

Minor: stripMarkdown.js — italic regex can mangle snake_case identifiers

.replace(/_(.+?)_/g, "$1")

This strips underscores from identifiers like diego_cell_count, producing diego_cell_countdiegocellcount (inner underscores consumed by successive matches). For a copy-to-clipboard utility the impact is low, but it could make technical output (e.g., metric names, config keys) harder to read.

Consider restricting the italic pattern to match only when the underscores are surrounded by word boundaries or whitespace, e.g. /(^|\s)_(\S[^_\n]*\S)_(\s|$)/gm. Alternatively, accept this as a known limitation and add a comment.


Minor: Ordered list numbers stripped entirely

.replace(/^\s*\d+\.\s+/gm, "")

This turns 1. First item\n2. Second item into First item\nSecond item. When a user copies a numbered list and pastes it into an email or ticket, the ordering context is lost. Consider replacing with the number: "$1. " → keep the captured digit group. (This is a UX call, not a correctness bug.)


Minor: Prompt test coverage for calendar references

TestSystemPromptNoCalendarReferences checks for q1/q2/q3/q4 and "fiscal year", but not for month names (january, february, …) or informal references like "end of year", "next quarter". If the system prompt is ever extended, those references could slip through. Consider broadening the list or using a regex.


Observation: "write" rate limit for feedback

The feedback endpoint uses RateLimit: "write" (default 10 req/min, same as the chat endpoint). At 10 per minute a user clicking thumbs up/down on several messages in a session could realistically hit the limit. A dedicated "feedback" tier at, say, 30/min would be less disruptive. Low priority, but worth a follow-up ticket.


Nits / positives

  • []rune truncation in the backend is correct for multibyte input — good catch.
  • useRef for the copy timeout cleanup avoids stale-closure bugs — well done.
  • The "adjust state during render" pattern for resetting feedback on session clear is the correct React approach (better than useEffect); a one-line comment explaining why would help future readers.
  • aria-pressed and dynamic aria-label on feedback buttons are solid a11y choices.
  • The prompt additions are well-framed: urgency tiers are concrete, budget justification avoids dollar estimates, and the calendar-reference prohibition is explicit.

Summary:

Severity Issue
Bug sendFeedback called inside React state updater — can double-fire in Strict Mode
Minor _italic_ regex mangles snake_case identifiers in stripMarkdown
Minor Ordered list numbers stripped rather than preserved
Minor Calendar-reference test coverage is incomplete
Nit "write" rate limit may be too tight for feedback; consider a dedicated tier

The bug is worth fixing before merge; the rest can be addressed as follow-up issues or accepted as-is with a comment.

malston added 5 commits March 3, 2026 16:12
- Move sendFeedback out of React state updater to prevent duplicate
  POSTs in StrictMode (updaters must be pure)
- Preserve code block content in stripMarkdown by extracting to
  placeholders before running transformation pipeline
- Restrict italic underscore regex to word boundaries to preserve
  snake_case identifiers like diego_cell_count
- Strip unordered list markers entirely for consistency with ordered
- Fix action bar a11y: hidden buttons no longer keyboard-focusable
  (pointer-events-none when hidden, restored on hover/focus-within)
The react-hooks/refs lint rule disallows setting ref.current during
render. Move the ref sync into a useEffect keyed on feedbackState.
The ref-based approach introduced a stale-value race condition when
synced via useEffect (ref wouldn't update until after commit). Instead,
read feedbackState directly in handleFeedback and add it to the
useCallback dependency array. Feedback toggles are infrequent so the
re-memoization cost is negligible.
Documentation guidelines prohibit em dashes in comments.
Planning artifacts are development aids, not deployable code.
Already in .gitignore; this removes them from the git index.
@malston malston merged commit 3c306a4 into main Mar 4, 2026
3 checks passed
@malston malston deleted the gsd/phase-08-polish branch March 4, 2026 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants