feat: Phase 8 polish -- feedback endpoint, prompt tuning, and action bar by malston · Pull Request #135 · malston/diego-capacity-analyzer

malston · 2026-03-03T22:22:57Z

Summary

Feedback endpoint (POST /api/v1/chat/feedback): Auth-guarded endpoint that logs thumbs up/down ratings via structured slog for analytics without persisting state. Includes rate limiting and input validation.
Procurement-oriented prompt tuning: Expanded system prompt with urgency tier thresholds (70/80/90%) and budget justification framing (deployment failure risk, SLA exposure, developer velocity impact).
Action bar UI: Hover-reveal action bar on assistant messages with copy-to-clipboard (strips Markdown to plain text) and thumbs up/down feedback buttons. Always visible on mobile viewports.

Test plan

Backend: 11 new feedback handler tests (auth, validation, slog capture, truncation)
Backend: 3 new prompt tests (urgency tiers, budget justification, no calendar references)
Frontend: 24 new stripMarkdown utility tests
Frontend: 16 new action bar / feedback interaction tests
Full test suite green: all backend packages pass, 433 frontend tests pass
Manual verification: hover-reveal, copy plain text, feedback toggle, mobile layout

- Table-driven TestChatFeedback covering auth, validation, slog logging - TestChatFeedback_TruncatesLongQuestion verifies server-side truncation - Custom captureLogHandler for slog assertion

…and slog logging - ChatFeedback handler validates auth, rating (up/down/none), message_index - Server-side truncation of question to 100 chars (defense in depth) - Logs feedback via slog.Info with username, rating, message_index, question - Route registered at POST /api/v1/chat/feedback with "write" rate limit

…ripMarkdown - stripMarkdown utility: headers, bold, italic, links, images, code, blockquotes, lists, tables, edge cases - Action bar: visibility on assistant vs user messages, streaming exclusion, hover classes - Copy button: strips markdown, writes to clipboard, checkmark feedback - Feedback buttons: thumbs up/down toggle, aria-pressed, sendFeedback integration - ChatMessages feedback integration: toggle logic, truncated question, sendFeedback calls

…tion - TestSystemPromptContainsUrgencyTiers checks for 70/80/90% thresholds, expedite, burst capacity - TestSystemPromptContainsBudgetJustification checks for deployment failure, SLA, developer velocity - TestSystemPromptNoCalendarReferences ensures no Q1-Q4 or fiscal year references

… justification - Added urgency tier guidance mapped to 70/80/90% utilization thresholds - Added budget justification section: deployment failure risk, SLA exposure, developer velocity impact - Uses relative timing throughout, no calendar-specific references - System prompt stays under 10,000 character budget - All existing and new tests pass

…ant messages - stripMarkdown utility: regex-based Markdown-to-plain-text conversion - sendFeedback API function: fire-and-forget POST to /api/v1/chat/feedback - CopyButton: strips markdown, writes to clipboard, checkmark icon for 2s - FeedbackButtons: thumbs up/down toggle with green/red active states - Action bar: hover-reveal on desktop (md:group-hover), always visible on mobile - Feedback state managed in ChatMessages, keyed by message index - Resets feedback state when conversation is cleared

- SUMMARY.md with task commits, decisions, self-check - STATE.md updated: phase 8 in progress, plan 1 of 2 complete - ROADMAP.md updated with phase 08 progress - REQUIREMENTS.md: POL-02 and POL-03 marked complete

- 08-02-SUMMARY.md created with task commits, decisions, and self-check - STATE.md updated to 100% complete, all 17 plans done - ROADMAP.md phase 8 plans marked complete - REQUIREMENTS.md POL-01 marked complete

claude · 2026-03-03T22:25:06Z

PR Review: Phase 8 Polish -- Feedback Endpoint, Prompt Tuning, and Action Bar

Overall this is a well-structured, well-tested PR. The feature decomposition is clean, the test coverage is solid, and the fire-and-forget feedback design is a sensible call for non-critical telemetry. A few items worth addressing before merge:

Issues

1. `CopyButton`: `setTimeout` leaks on unmount (medium)

CopyButton.handleCopy schedules setCopied(false) via setTimeout, but there is no cleanup if the component unmounts during the 2-second window. In React strict mode this will log a warning; in production it silently skips the update, but it is still a resource concern.

// frontend/src/components/chat/ChatMessage.jsx
const handleCopy = useCallback(async () => {
  try {
    await navigator.clipboard.writeText(stripMarkdown(content));
    setCopied(true);
    setTimeout(() => setCopied(false), 2000); // ← no cleanup
  } catch (err) { ... }
}, [content]);

Fix: track the timer in a useRef and clear it in a useEffect cleanup.

const timerRef = useRef(null);
useEffect(() => () => clearTimeout(timerRef.current), []);

const handleCopy = useCallback(async () => {
  try {
    await navigator.clipboard.writeText(stripMarkdown(content));
    setCopied(true);
    timerRef.current = setTimeout(() => setCopied(false), 2000);
  } catch (err) { ... }
}, [content]);

2. Prompt encourages cost estimates the LLM cannot verify (medium)

The new Budget Justification section instructs:

"N hosts at an estimated $X-Y per host" is more actionable than "N hosts"

The system has no pricing data. Asking the LLM to frame procurement in dollar terms will cause it to hallucinate prices — directly contradicting DOM-03 ("acknowledge data gaps rather than hallucinate"). Consider rephrasing to encourage the operator to fill in their own unit costs:

Frame procurement quantities so the operator can apply their own unit costs:
"N additional hosts" gives procurement a concrete quantity to price against their vendor contracts.
Avoid estimating dollar values — the system does not have pricing data.

3. `sendFeedback` swallows non-2xx HTTP errors silently (low)

fetch only rejects on network failures. A 429 or 401 from the feedback endpoint will be silently discarded. This is largely intentional (fire-and-forget), but a rate-limited user would never know their feedback was dropped. Consider at minimum a console.warn on non-OK responses:

const res = await fetch(...);
if (!res.ok) {
  console.warn(`Feedback submission returned ${res.status}`);
}

4. `feedback` route uses the generic `"write"` rate limit bucket (low)

// backend/handlers/routes.go
{Method: http.MethodPost, Path: "/api/v1/chat/feedback", Handler: h.ChatFeedback, RateLimit: "write"},

The "write" bucket presumably governs infrastructure state changes. Feedback is much lower-risk and could tolerate a more permissive limit, or conversely deserves its own limit to prevent log flooding via the feedback endpoint. A dedicated "feedback" rate limit key would make the intent explicit and allow independent tuning.

5. Ordered vs unordered list stripping is inconsistent (low)

stripMarkdown replaces unordered list markers with "- " (preserving visual structure) but strips ordered list markers entirely with no prefix:

.replace(/^\s*[-*+]\s+/gm, "- ")   // keeps "- " prefix
.replace(/^\s*\d+\.\s+/gm, "")     // strips marker, no prefix

For clipboard content this probably does not matter much, but it means 1. First\n2. Second becomes First\nSecond while - Item 1\n- Item 2 becomes - Item 1\n- Item 2. Probably fine for the use case, but worth a quick decision.

Positive Observations

Test coverage is excellent. captureLogHandler is a clever approach to assert on slog output without coupling tests to a specific log format. The 11 feedback tests and 24 stripMarkdown tests cover the edge cases thoroughly.
validRatings as a map for O(1) lookup is the right call.
Server-side truncation at 100 chars as defense-in-depth alongside the client-side truncation is a good belt-and-suspenders approach.
showActionBar = isAssistant && message.content && !isStreaming correctly prevents the action bar from appearing on streaming messages.
feedbackState reset on messages.length === 0 is the right place to hook the reset; correlating it to conversation clear is clean.
aria-pressed on feedback buttons — good accessibility practice.
Urgency tier tests (TestSystemPromptContainsUrgencyTiers, TestSystemPromptNoCalendarReferences) are a nice pattern for prompt regression testing.

Summary

Issues 1 and 2 are worth fixing before merge. Issues 3–5 are informational. Happy to approve once the setTimeout cleanup is addressed and the prompt cost-estimation language is reconsidered.

Copilot

Pull request overview

Adds “Phase 8” polish to the AI Advisor by introducing a stateless feedback signal path, tightening the procurement-oriented system prompt, and improving assistant-message UX with an action bar for copy + thumbs feedback.

Changes:

Backend: new auth-guarded POST /api/v1/chat/feedback endpoint that validates input and logs structured feedback via slog.
Backend: expanded procurement framing in the system prompt (urgency tiers + budget justification language) with new prompt-content tests.
Frontend: assistant message action bar (copy-to-clipboard using Markdown→text stripping + thumbs up/down toggle) and sendFeedback client helper, with new unit tests.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
frontend/src/utils/stripMarkdown.js	Adds Markdown-to-plain-text utility used for clipboard copy.
frontend/src/utils/stripMarkdown.test.js	Adds unit tests covering many Markdown stripping cases.
frontend/src/services/chatApi.js	Adds fire-and-forget `sendFeedback()` POST to feedback endpoint.
frontend/src/components/chat/ChatMessage.jsx	Adds assistant action bar UI (copy + feedback buttons).
frontend/src/components/chat/ChatMessages.jsx	Tracks per-message feedback state and triggers `sendFeedback()` on toggles.
frontend/src/components/chat/ChatPanel.test.jsx	Extends component tests for action bar visibility, copy behavior, and feedback interactions.
backend/handlers/feedback.go	Implements `ChatFeedback` handler with validation + structured logging.
backend/handlers/feedback_test.go	Adds handler tests for auth/validation/logging/truncation behaviors.
backend/handlers/routes.go	Registers the new `/api/v1/chat/feedback` route with rate limit tier `write`.
backend/services/ai/prompt.go	Extends procurement framing with urgency tiers and budget justification guidance.
backend/services/ai/prompt_test.go	Adds tests to enforce new procurement prompt requirements and avoid calendar references.
.planning/phases/08-polish/08-01-SUMMARY.md	Documents plan 08-01 completion (feedback endpoint + prompt tuning).
.planning/phases/08-polish/08-02-SUMMARY.md	Documents plan 08-02 completion (action bar + stripMarkdown + sendFeedback).
.planning/STATE.md	Records project milestone completion state.
.planning/ROADMAP.md	Adds roadmap/phase documentation snapshot.
.planning/REQUIREMENTS.md	Adds requirements documentation snapshot.

Comments suppressed due to low confidence (2)

backend/services/ai/prompt_test.go:137

The comment says “At least one of these business-impact terms must be present”, but the assertion requires all 3 terms (found < 3 fails). Update the comment to match the test’s actual requirement (or relax the assertion if “at least one” is intended).

	lower := strings.ToLower(systemPrompt)
	// At least one of these business-impact terms must be present
	businessTerms := []string{
		"deployment failure",
		"sla",
		"developer velocity",
	}
	found := 0
	for _, term := range businessTerms {
		if strings.Contains(lower, term) {
			found++
		}
	}
	if found < 3 {
		t.Errorf("system prompt should contain all budget justification terms (deployment failure, SLA, developer velocity), found %d of 3", found)

.planning/phases/08-polish/08-02-SUMMARY.md:10

This summary claims the feedback endpoint provides “rating persistence”, but the implementation is explicitly stateless (logs via slog without storing). Consider rewording to avoid implying persistence/storage.

requires:
  - phase: 08-polish-01
    provides: POST /api/v1/chat/feedback endpoint for thumbs up/down rating persistence
  - phase: 05-frontend-chat

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backend/handlers/feedback.go

frontend/src/components/chat/ChatPanel.test.jsx

frontend/src/components/chat/ChatMessage.jsx

frontend/src/components/chat/ChatMessages.jsx

frontend/src/services/chatApi.js

backend/handlers/feedback.go

- Fix setTimeout leak on unmount in CopyButton (useRef + cleanup) - Remove unused `within` import from ChatPanel.test.jsx - Use rune-based truncation for UTF-8 safety in feedback handler - Remove hallucination-inducing price estimate language from prompt - Fix misleading test comment (says "at least one" but requires all 3) - Log non-2xx responses in sendFeedback for debuggability

- Remove unused withFeedbackTestAuth (staticcheck U1000), alias feedbackTestClaims to shared testClaims - Replace useEffect setState with render-time state adjustment (react-hooks/set-state-in-effect) - Add clipboard failure feedback with red icon and "Copy failed" label - Fix clearTimeout on rapid copy clicks to prevent premature reset - Move handleFeedback toggle logic inside functional updater to remove feedbackState dependency and reduce React.memo re-renders - Capture parse error message in parseSSEEvent bare catch - Add sendFeedback unit tests for CSRF, credentials, and fire-and-forget

malston · 2026-03-03T22:54:43Z

/claude-review

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

frontend/src/utils/stripMarkdown.js

frontend/src/components/chat/ChatMessages.jsx

Copilot · 2026-03-03T22:58:40Z

frontend/src/components/chat/ChatMessage.jsx

+        {showActionBar && (
+          <div
+            data-testid="action-bar"
+            className="absolute top-2 right-2 md:opacity-0 md:group-hover:opacity-100 transition-opacity flex items-center gap-0.5 bg-slate-800 border border-slate-600 rounded-md px-1 py-0.5"
+          >
+            <CopyButton content={message.content} />
+            <FeedbackButtons
+              rating={feedbackRating}
+              onRate={(rating) => onFeedback?.(rating)}
+            />
          </div>


The action bar is hidden on desktop using md:opacity-0, but it remains in the tab order and accessible tree. Keyboard users can tab to invisible buttons, and screen readers may announce controls that aren't visibly available. Consider also toggling interactivity/visibility (e.g., md:pointer-events-none when hidden + md:group-hover:pointer-events-auto, and/or md:group-focus-within:opacity-100) or conditionally rendering the action bar only when visible/focused.

Fixed. Added md:pointer-events-none when hidden, md:group-hover:pointer-events-auto and md:group-focus-within:opacity-100 + pointer-events-auto so keyboard users can tab into the action bar when it becomes visible via focus.

Copilot · 2026-03-03T22:58:40Z

frontend/src/utils/stripMarkdown.js

+      // Blockquotes
+      .replace(/^>\s?/gm, "")
+      // Unordered list markers
+      .replace(/^\s*[-*+]\s+/gm, "- ")


stripMarkdown claims to “strip unordered list markers”, but the regex replaces them with a literal "- " (so list markers still appear in the copied plain text). Either remove the marker entirely (replace with "") or update the doc/tests to reflect that bullets are intentionally preserved/normalized.

Suggested change

.replace(/^\s*[-*+]\s+/gm, "- ")

.replace(/^\s*[-*+]\s+/gm, "")

Fixed. Unordered list markers now stripped entirely for consistency with ordered list handling.

claude · 2026-03-03T22:58:48Z

Code Review: Phase 8 Polish — Feedback Endpoint, Prompt Tuning, Action Bar

Overall this is a well-structured PR with solid test coverage. The design is appropriate for a fire-and-forget analytics signal: no state persistence, auth-guarded, rate-limited, and input-validated. A few issues worth addressing below.

Bug: Side effect inside React state updater (`ChatMessages.jsx`)

The handleFeedback callback calls sendFeedback(...) inside the functional form of setFeedbackState. State updater functions must be pure — React 18 Strict Mode intentionally invokes them twice in development to catch exactly this class of bug, which would result in duplicate network requests per user action.

// Anti-pattern: network call inside a state updater
setFeedbackState(prev => {
  sendFeedback(...);            // <-- side effect here
  return { ...prev, [index]: newRating };
});

The fix is to compute newRating (and decide whether to call sendFeedback) before entering the updater, then call sendFeedback outside of it:

const handleFeedback = useCallback((messageIndex, rating) => {
  const prev = feedbackStateRef.current;           // or derive newRating from current state
  const current = prev[messageIndex];
  const newRating = current === rating ? "none" : rating;

  // Side effect outside the updater
  const precedingUser = [...messages].reverse().find(
    (m, i) => messages.length - 1 - i < messageIndex && m.role === "user"
  );
  const truncatedQuestion = precedingUser?.content?.slice(0, 100) ?? "";
  sendFeedback({ messageIndex, rating: newRating, truncatedQuestion });

  setFeedbackState(prev =>
    newRating === "none"
      ? Object.fromEntries(Object.entries(prev).filter(([k]) => k !== String(messageIndex)))
      : { ...prev, [messageIndex]: newRating }
  );
}, [messages]);

Minor: `stripMarkdown.js` — italic regex can mangle snake_case identifiers

.replace(/_(.+?)_/g, "$1")

This strips underscores from identifiers like diego_cell_count, producing diego_cell_count → diegocellcount (inner underscores consumed by successive matches). For a copy-to-clipboard utility the impact is low, but it could make technical output (e.g., metric names, config keys) harder to read.

Consider restricting the italic pattern to match only when the underscores are surrounded by word boundaries or whitespace, e.g. /(^|\s)_(\S[^_\n]*\S)_(\s|$)/gm. Alternatively, accept this as a known limitation and add a comment.

Minor: Ordered list numbers stripped entirely

.replace(/^\s*\d+\.\s+/gm, "")

This turns 1. First item\n2. Second item into First item\nSecond item. When a user copies a numbered list and pastes it into an email or ticket, the ordering context is lost. Consider replacing with the number: "$1. " → keep the captured digit group. (This is a UX call, not a correctness bug.)

Minor: Prompt test coverage for calendar references

TestSystemPromptNoCalendarReferences checks for q1/q2/q3/q4 and "fiscal year", but not for month names (january, february, …) or informal references like "end of year", "next quarter". If the system prompt is ever extended, those references could slip through. Consider broadening the list or using a regex.

Observation: "write" rate limit for feedback

The feedback endpoint uses RateLimit: "write" (default 10 req/min, same as the chat endpoint). At 10 per minute a user clicking thumbs up/down on several messages in a session could realistically hit the limit. A dedicated "feedback" tier at, say, 30/min would be less disruptive. Low priority, but worth a follow-up ticket.

Nits / positives

[]rune truncation in the backend is correct for multibyte input — good catch.
useRef for the copy timeout cleanup avoids stale-closure bugs — well done.
The "adjust state during render" pattern for resetting feedback on session clear is the correct React approach (better than useEffect); a one-line comment explaining why would help future readers.
aria-pressed and dynamic aria-label on feedback buttons are solid a11y choices.
The prompt additions are well-framed: urgency tiers are concrete, budget justification avoids dollar estimates, and the calendar-reference prohibition is explicit.

Summary:

Severity	Issue
Bug	`sendFeedback` called inside React state updater — can double-fire in Strict Mode
Minor	`_italic_` regex mangles snake_case identifiers in `stripMarkdown`
Minor	Ordered list numbers stripped rather than preserved
Minor	Calendar-reference test coverage is incomplete
Nit	`"write"` rate limit may be too tight for feedback; consider a dedicated tier

The bug is worth fixing before merge; the rest can be addressed as follow-up issues or accepted as-is with a comment.

- Move sendFeedback out of React state updater to prevent duplicate POSTs in StrictMode (updaters must be pure) - Preserve code block content in stripMarkdown by extracting to placeholders before running transformation pipeline - Restrict italic underscore regex to word boundaries to preserve snake_case identifiers like diego_cell_count - Strip unordered list markers entirely for consistency with ordered - Fix action bar a11y: hidden buttons no longer keyboard-focusable (pointer-events-none when hidden, restored on hover/focus-within)

The react-hooks/refs lint rule disallows setting ref.current during render. Move the ref sync into a useEffect keyed on feedbackState.

The ref-based approach introduced a stale-value race condition when synced via useEffect (ref wouldn't update until after commit). Instead, read feedbackState directly in handleFeedback and add it to the useCallback dependency array. Feedback toggles are infrequent so the re-memoization cost is negligible.

Documentation guidelines prohibit em dashes in comments.

Planning artifacts are development aids, not deployable code. Already in .gitignore; this removes them from the git index.

malston added 8 commits March 3, 2026 14:44

test(08-01): add failing tests for chat feedback endpoint

8862e79

- Table-driven TestChatFeedback covering auth, validation, slog logging - TestChatFeedback_TruncatesLongQuestion verifies server-side truncation - Custom captureLogHandler for slog assertion

docs(08-01): complete feedback endpoint and procurement prompt plan

0bebae9

- SUMMARY.md with task commits, decisions, self-check - STATE.md updated: phase 8 in progress, plan 1 of 2 complete - ROADMAP.md updated with phase 08 progress - REQUIREMENTS.md: POL-02 and POL-03 marked complete

docs(08-02): complete action bar with copy and feedback plan

7b8d032

- 08-02-SUMMARY.md created with task commits, decisions, and self-check - STATE.md updated to 100% complete, all 17 plans done - ROADMAP.md phase 8 plans marked complete - REQUIREMENTS.md POL-01 marked complete

Copilot AI review requested due to automatic review settings March 3, 2026 22:22

Copilot started reviewing on behalf of malston March 3, 2026 22:23 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

malston added 2 commits March 3, 2026 15:39

malston requested a review from Copilot March 3, 2026 22:54

Copilot started reviewing on behalf of malston March 3, 2026 22:55 View session

Copilot AI reviewed Mar 3, 2026

View reviewed changes

malston added 5 commits March 3, 2026 16:12

fix: sync feedbackStateRef via useEffect instead of during render

cd37995

The react-hooks/refs lint rule disallows setting ref.current during render. Move the ref sync into a useEffect keyed on feedbackState.

fix: replace em dash with double hyphen in stripMarkdown comment

caf7f25

Documentation guidelines prohibit em dashes in comments.

chore: remove .planning/ from tracked files before merge

75ca9e5

Planning artifacts are development aids, not deployable code. Already in .gitignore; this removes them from the git index.

malston merged commit 3c306a4 into main Mar 4, 2026
3 checks passed

malston deleted the gsd/phase-08-polish branch March 4, 2026 01:23

	.replace(/^\s[-+]\s+/gm, "- ")
	.replace(/^\s[-+]\s+/gm, "")

Conversation

malston commented Mar 3, 2026

Summary

Test plan

Uh oh!

claude bot commented Mar 3, 2026

PR Review: Phase 8 Polish -- Feedback Endpoint, Prompt Tuning, and Action Bar

Issues

1. CopyButton: setTimeout leaks on unmount (medium)

2. Prompt encourages cost estimates the LLM cannot verify (medium)

3. sendFeedback swallows non-2xx HTTP errors silently (low)

4. feedback route uses the generic "write" rate limit bucket (low)

5. Ordered vs unordered list stripping is inconsistent (low)

Positive Observations

Summary

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

malston commented Mar 3, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

malston Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

malston Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude bot commented Mar 3, 2026

Code Review: Phase 8 Polish — Feedback Endpoint, Prompt Tuning, Action Bar

Bug: Side effect inside React state updater (ChatMessages.jsx)

Minor: stripMarkdown.js — italic regex can mangle snake_case identifiers

Minor: Ordered list numbers stripped entirely

Minor: Prompt test coverage for calendar references

Observation: "write" rate limit for feedback

Nits / positives

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

1. `CopyButton`: `setTimeout` leaks on unmount (medium)

3. `sendFeedback` swallows non-2xx HTTP errors silently (low)

4. `feedback` route uses the generic `"write"` rate limit bucket (low)

Bug: Side effect inside React state updater (`ChatMessages.jsx`)

Minor: `stripMarkdown.js` — italic regex can mangle snake_case identifiers