This audit covers the current mascot ask flow from UI to Bedrock and back, including cancellation, retry, memory, safety, and token-cost controls.
- Frontend: React + Vite +
react-router-dom. - API layer: serverless-style
api/pebble.ts. - LLM path (safe mode default): browser calls
POST /api/pebble, server calls AWS Bedrock Anthropic model. - Optional local demo mode (
unsafe_client) remains client-side and uses the same shared system prompt text.
- User enters text in mascot panel (
PebbleMascot.tsx) and clicksAsk Pebbleor presses Enter. onAskPebble()validates input and callssubmitQuestion(question, { appendUserTurn: true }).submitQuestion():- increments request id (
askRequestIdRef) and aborts any previous in-flight request. - creates a fresh
AbortControllerand stores it inactiveAbortRef. - transitions UI to
thinking, resets answer display state. - updates memory window (last 6 turns max).
- builds compact prompt (
buildPebblePrompt) with question, compact memory lines, and essential state snapshot. - calls
askPebble({ prompt, context, signal }).
- increments request id (
askPebble()(client util):- default mode calls
/api/pebblewith JSON body{ prompt, context }and abort bridging. - reads
{ text }on success or surfaces server{ error }.
- default mode calls
api/pebble.ts:- validates method/body/context.
- compacts context deterministically (
compactContextForModel). - builds Bedrock user message from prompt + compact context + constraints.
- invokes Bedrock Anthropic via
InvokeModelwithsystemfield. - parses response content text and returns
{ text }.
- Frontend receives answer:
- request-id guard blocks stale/late updates.
- answer is typewriter-rendered (
typingstate). - on completion, assistant turn is appended to memory with metadata and celebration pulse triggers.
- Stop path:
- UI
Stopinvalidates request id, aborts active fetch, clears typing timers, resets answer UI state, returns toidle.
- UI
- Request-id guard:
- even if network returns after cancel, stale responses are ignored.
- Single-flight:
- new submits abort prior in-flight request before creating a new controller.
- Unmount safety:
- component cleanup aborts active request and clears timers/RAF settle loop.
- Retry appears only for “error-ish” assistant messages.
RetryreuseslastAskedQuestionand calls submit withappendUserTurn: false.- This avoids duplicate user-memory turns.
- Assistant turn is still appended only when typewriter completes.
- Server responses:
- success:
200 { text: string } - timeout:
504 { error: 'Pebble request timed out.' } - user stop (abort not caused by timeout):
200 { text: 'Stopped.' } - other failures:
502 { error: string }
- success:
- Client handling:
- preserves user-friendly timeout/temporary error messages.
- “Stopped.” is treated as a normal short answer path and does not crash UI flow.
- No AWS secrets are used in browser safe mode.
- Bedrock call is server-only (
api/pebble.ts) usingprocess.env. - API contract remains unchanged.
.env.localis ignored by git; secrets should be stored in deployment env settings.- Credential strategy:
- server accepts explicit
AWS_ACCESS_KEY_ID+AWS_SECRET_ACCESS_KEYif both are provided. - otherwise falls back to AWS default credential provider chain.
- server accepts explicit
- Logging safety:
- optional cost debug (
PEBBLE_DEBUG_COST=1) logs only character counts and token estimate, never prompt content or secrets.
- optional cost debug (
- UI:
- timers/intervals are cleaned correctly.
- drag settle uses RAF and cancels on pointer down.
- request-id gating prevents late-state churn.
- API:
- timeout enforced at 20s with abort.
- response parsing guarded for malformed payload shapes.
- Concurrency:
- one active request per mascot instance.
Primary cost drivers before tuning:
- pretty-printed context JSON,
- duplicated context text,
- oversized code/run messages.
Current optimizations:
- Shared system prompt rules in
src/shared/pebblePromptRules.tsprevents drift. - Compact prompt composition in UI (reduced labels/blank lines).
- Server-side context compaction:
- keeps essential fields only,
- trims
runMessageto 360 chars, - trims
codeTextto 1800 chars with middle marker, - limits
errorHistoryto last 3.
- Server builds compact user message and avoids pretty JSON indentation.
- Bedrock generation tuned for short answers:
max_tokens: 240(aligned to <= 6 lines),temperature: 0.35(lower variance, more deterministic guidance).
Why max_tokens: 240:
- 6 terse lines generally fit well under this limit while allowing occasional clarifying question or concise steps.
- Lower than 300 reduces worst-case output spend with minimal quality loss for this UX.
- Rapid Enter/Ask spam while generating: blocked by
isGeneratingguard. - Stop during slow network: abort propagates; stale response blocked by request id.
- Route change/unmount mid-request: abort + cleanup prevents leaks.
- Bedrock malformed JSON/content mismatch: returns safe 502 with useful error.
- Large
codeText: trimmed before model message construction. - Memory growth: capped to last 6 turns.
- Shared persona source of truth (
src/shared/pebblePromptRules.ts):- calm, terse, action-first.
- strict clarifying-question rule under high struggle.
- guided mode scoping.
- success reinforcement + one micro next step.
- 6-line response cap and no fluff.
- Thinking personality in UI remains unchanged visually (
Thinking...). - Model formatting tuned toward minimal cognitive load and concise actionability.
Required:
AWS_REGIONBEDROCK_MODEL_ID
Optional:
AWS_ACCESS_KEY_IDAWS_SECRET_ACCESS_KEY- Provide both together, or neither.
PEBBLE_DEBUG_COST=1(server-only prompt size telemetry)
- Install and run:
npm installnpm run dev
- Open session UI:
- Navigate to
/session/1
- Navigate to
- Ask flow:
- Expand mascot, ask a coding question.
- Expect
Thinking...->Typing...-> final answer.
- Drag settle check:
- Drag mascot upward and release.
- Expect fall + subtle bounce to bottom, draggable mid-animation.
- Stop check:
- Ask, then press
Stopwhile thinking/typing. - Expect prompt stop, no late overwrite, return to idle.
- Ask, then press
- Retry check:
- Force server/model error (e.g., invalid
BEDROCK_MODEL_ID), ask once. - Expect error-ish answer +
Retrybutton. - Press retry; verify no duplicate user memory turn.
- Force server/model error (e.g., invalid
- Build check:
npm run build- Expect TypeScript + Vite build success.
- Health (no Bedrock call):
curl -sS http://localhost:5173/api/pebble | jq
- Minimal POST:
curl -sS -X POST http://localhost:5173/api/pebble -H "Content-Type: application/json" -d '{"prompt":"hi","context":{}}' | jq
- Runtime: Node serverless function for
/api/pebble. - Set env vars in Vercel project settings (do not use
VITE_prefix for secrets). - Keep client bundle secret-free; all Bedrock auth remains server-side.