feat: add experimental verbal sampling CoT prompt generation#36
feat: add experimental verbal sampling CoT prompt generation#36zeno205 wants to merge 1 commit intoT3-Content:mainfrom
Conversation
📝 WalkthroughWalkthroughA new experimental Chain-Of-Thought prompt generation feature is introduced, which generates multiple JSON-formatted prompt candidates, parses them with error correction, validates them, and selects the best one. A complementary JSON parsing utility module is added to handle automated fixes for common JSON formatting issues. Changes
Sequence DiagramsequenceDiagram
actor User
participant callGeneratePrompt
participant LLM
participant tryParseJSON
participant callSelectBestPrompt
participant callSelectBestPrompt2 as LLM<br/>(Selection)
User->>callGeneratePrompt: Request prompt (CoT enabled)
callGeneratePrompt->>LLM: Generate 5 JSON-formatted prompts
LLM-->>callGeneratePrompt: Raw JSON response
callGeneratePrompt->>tryParseJSON: Parse JSON with fixes
tryParseJSON-->>callGeneratePrompt: Parsed prompts + fixes applied
callGeneratePrompt->>callSelectBestPrompt: Select best prompt
callSelectBestPrompt->>callSelectBestPrompt2: LLM selection request
callSelectBestPrompt2-->>callSelectBestPrompt: Selected prompt
callSelectBestPrompt-->>callGeneratePrompt: Best prompt
callGeneratePrompt-->>User: Final prompt
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Add experimental verbal sampling CoT prompt generation and branch
|
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (2)
game.ts (1)
321-345:callSelectBestPromptfailure causes full re-generation of candidates.If this API call throws (network error, rate limit, etc.), the error propagates up through
callGeneratePrompt, which is wrapped inwithRetry. This causes the entire 5-candidate generation to be re-executed. Consider catching the error here and falling back to the probability-based selection directly, since valid candidates already exist.♻️ Suggested approach in callGeneratePrompt
- const selected = await callSelectBestPrompt( - model, - candidates.map((c) => c.joke), - ); + let selected: string | null = null; + try { + selected = await callSelectBestPrompt( + model, + candidates.map((c) => c.joke), + ); + } catch (err) { + log("WARN", `prompt:${model.name}`, "callSelectBestPrompt failed, using fallback", { + error: err instanceof Error ? err.message : String(err), + }); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@game.ts` around lines 321 - 345, callSelectBestPrompt currently lets errors from generateText bubble up (causing callGeneratePrompt wrapped in withRetry to regenerate all candidates); modify callSelectBestPrompt to catch exceptions thrown by generateText (or any part of the API call), log the error via log("ERROR", ...) including model.id/name and the exception, and then return a deterministic fallback choice (e.g., pick the highest-probability candidate by index from the provided jokes array or use an existing probability-based selector) so callers still receive a valid string; ensure the returned value is passed through cleanResponse before returning and preserve types/signature of callSelectBestPrompt.llm-json-fixer.ts (1)
252-269: Redundant parse attempt when a fix is a no-op.When a fix function returns the same string (line 265–267),
processedisn't updated, but the loop still runsJSON.parseagain on the unchanged input in the next iteration, producing a duplicate warning. Consider skipping the iteration when the fix is a no-op.♻️ Suggested improvement
warnings.push((error as Error).message); if (i === attempts.length) { break; } const next = attempts[i]!(); if (next !== processed) { processed = next; + } else { + continue; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@llm-json-fixer.ts` around lines 252 - 269, The loop currently pushes the same JSON.parse error repeatedly when a fix is a no-op; in the catch block, only push the error message to warnings if it differs from the last warning (e.g., check warnings[warnings.length-1] !== (error as Error).message before pushing), and after computing next = attempts[i]!(), if next === processed simply continue to the next attempt without updating processed (so you avoid redundant parse attempts and duplicate warnings). Update references in this block around attempts, processed, warnings and the catch handling to implement these checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@game.ts`:
- Around line 267-269: The strict check throws when parsed.jokes.length !== 5;
relax this to allow variable-length LLM output by validating parsed.jokes is an
array and has at least one item (e.g., parsed.jokes.length < 1) instead of
requiring exactly 5; reference the existing downstream validation in the
mapping/filtering logic around the jokes processing (the map/filter block
following parsed.jokes) to ensure invalid entries are still discarded and at
least one valid candidate is present.
- Around line 297-305: callSelectBestPrompt returns model text that may not
exactly match candidate.joke, so the exact equality check (selected === c.joke)
in the matched lookup is fragile; update selection handling in game.ts to
robustly map the model output to a candidate by either (A) changing the prompt
in callSelectBestPrompt to instruct the model to return a single index and then
parse that index to pick from candidates, or (B) perform a normalized fuzzy
match after receiving selected — e.g., trim/normalize whitespace and punctuation
and compare lowercased strings or use a small Levenshtein/similarity threshold
to find the closest candidate.joke before falling back to the probability-based
path; apply this logic where matched is computed so matched.joke reliably
resolves to the intended candidate.
In `@llm-json-fixer.ts`:
- Around line 193-197: The depth counter currently increments/decrements for
every brace/brace character in the raw line (variables depth, line, char), which
miscounts when those characters appear inside JSON string values; update the
loop to track whether we're inside a quoted string (e.g., an inString boolean
toggled when encountering an unescaped double-quote, handling backslash escapes)
and only modify depth for "{" "[" "}" "]" when inString is false; apply the same
change to the other matching loop referenced around lines 211-214 so brackets
inside strings do not affect depth.
---
Nitpick comments:
In `@game.ts`:
- Around line 321-345: callSelectBestPrompt currently lets errors from
generateText bubble up (causing callGeneratePrompt wrapped in withRetry to
regenerate all candidates); modify callSelectBestPrompt to catch exceptions
thrown by generateText (or any part of the API call), log the error via
log("ERROR", ...) including model.id/name and the exception, and then return a
deterministic fallback choice (e.g., pick the highest-probability candidate by
index from the provided jokes array or use an existing probability-based
selector) so callers still receive a valid string; ensure the returned value is
passed through cleanResponse before returning and preserve types/signature of
callSelectBestPrompt.
In `@llm-json-fixer.ts`:
- Around line 252-269: The loop currently pushes the same JSON.parse error
repeatedly when a fix is a no-op; in the catch block, only push the error
message to warnings if it differs from the last warning (e.g., check
warnings[warnings.length-1] !== (error as Error).message before pushing), and
after computing next = attempts[i]!(), if next === processed simply continue to
the next attempt without updating processed (so you avoid redundant parse
attempts and duplicate warnings). Update references in this block around
attempts, processed, warnings and the catch handling to implement these checks.
| if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) { | ||
| throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items"); | ||
| } |
There was a problem hiding this comment.
Strict === 5 check is brittle for LLM output.
LLMs don't always follow instructions exactly — they may return 4 or 6 items. The downstream map/filter at lines 271–295 already discards invalid entries and checks for at least one valid candidate. Consider relaxing this to a minimum-length check (e.g., < 1) rather than requiring exactly 5.
♻️ Suggested change
- if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) {
- throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items");
+ if (!Array.isArray(parsed.jokes) || parsed.jokes.length === 0) {
+ throw new Error("Invalid verbal sample CoT output: jokes array is empty or missing");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) { | |
| throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items"); | |
| } | |
| if (!Array.isArray(parsed.jokes) || parsed.jokes.length === 0) { | |
| throw new Error("Invalid verbal sample CoT output: jokes array is empty or missing"); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@game.ts` around lines 267 - 269, The strict check throws when
parsed.jokes.length !== 5; relax this to allow variable-length LLM output by
validating parsed.jokes is an array and has at least one item (e.g.,
parsed.jokes.length < 1) instead of requiring exactly 5; reference the existing
downstream validation in the mapping/filtering logic around the jokes processing
(the map/filter block following parsed.jokes) to ensure invalid entries are
still discarded and at least one valid candidate is present.
| const selected = await callSelectBestPrompt( | ||
| model, | ||
| candidates.map((c) => c.joke), | ||
| ); | ||
|
|
||
| const matched = candidates.find((c) => c.joke === selected); | ||
| if (matched) { | ||
| return matched.joke; | ||
| } |
There was a problem hiding this comment.
Exact-match comparison with LLM output is fragile.
callSelectBestPrompt asks the model to echo back the chosen prompt, then line 302 does an exact string match against the candidates. LLMs frequently introduce minor deviations — a leading number, trailing period, extra whitespace, or slight rewording — causing the match to silently fail and fall through to the probability-based fallback every time.
Consider a fuzzy match (e.g., normalized/trimmed comparison, or includes/Levenshtein) or have the model return just the index number instead:
♻️ Option A: Have the model return the index
- prompt: `Choose exactly one of these Quiplash prompts and reply with ONLY the exact prompt text, nothing else:\n\n${jokes
+ prompt: `Choose exactly one of these Quiplash prompts and reply with ONLY the number (1-${jokes.length}), nothing else:\n\n${jokes
.map((joke, i) => `${i + 1}. ${joke}`)
.join("\n")}`,Then parse the returned number and index into the candidates array.
♻️ Option B: Normalize before matching
- const matched = candidates.find((c) => c.joke === selected);
+ const normalize = (s: string) => s.replace(/^\d+[\.\)]\s*/, "").trim().toLowerCase();
+ const normalizedSelected = normalize(selected);
+ const matched = candidates.find((c) => normalize(c.joke) === normalizedSelected);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const selected = await callSelectBestPrompt( | |
| model, | |
| candidates.map((c) => c.joke), | |
| ); | |
| const matched = candidates.find((c) => c.joke === selected); | |
| if (matched) { | |
| return matched.joke; | |
| } | |
| const selected = await callSelectBestPrompt( | |
| model, | |
| candidates.map((c) => c.joke), | |
| ); | |
| const normalize = (s: string) => s.replace(/^\d+[\.\)]\s*/, "").trim().toLowerCase(); | |
| const normalizedSelected = normalize(selected); | |
| const matched = candidates.find((c) => normalize(c.joke) === normalizedSelected); | |
| if (matched) { | |
| return matched.joke; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@game.ts` around lines 297 - 305, callSelectBestPrompt returns model text that
may not exactly match candidate.joke, so the exact equality check (selected ===
c.joke) in the matched lookup is fragile; update selection handling in game.ts
to robustly map the model output to a candidate by either (A) changing the
prompt in callSelectBestPrompt to instruct the model to return a single index
and then parse that index to pick from candidates, or (B) perform a normalized
fuzzy match after receiving selected — e.g., trim/normalize whitespace and
punctuation and compare lowercased strings or use a small Levenshtein/similarity
threshold to find the closest candidate.joke before falling back to the
probability-based path; apply this logic where matched is computed so
matched.joke reliably resolves to the intended candidate.
| for (const char of line) { | ||
| if (char === "{" || char === "[") depth++; | ||
| else if (char === "}" || char === "]") depth--; | ||
| } | ||
| continue; |
There was a problem hiding this comment.
Depth tracking counts brackets inside string values.
The depth counter iterates raw characters, so brackets within JSON string values (e.g., "What {thing}...") will incorrectly adjust depth, potentially causing spurious comma insertion or suppression. For the current use case (simple joke objects) this is unlikely to trigger, but worth noting as a known limitation.
Also applies to: 211-214
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@llm-json-fixer.ts` around lines 193 - 197, The depth counter currently
increments/decrements for every brace/brace character in the raw line (variables
depth, line, char), which miscounts when those characters appear inside JSON
string values; update the loop to track whether we're inside a quoted string
(e.g., an inString boolean toggled when encountering an unescaped double-quote,
handling backslash escapes) and only modify depth for "{" "[" "}" "]" when
inString is false; apply the same change to the other matching loop referenced
around lines 211-214 so brackets inside strings do not affect depth.
Verbalized Sampling CoT prompt generation experiment
Adds an additional prompt generation strategy where the model generates 5 candidate prompts with a verbal chain-of-thought reasoning step, then uses a second model call to select the best one (Inspired by this paper).
Changes
llm-json-fixer.ts(new): Standalone JSON extraction/repair utility for LLM responses. Handles common failure modes: markdown fences, trailing text, unescaped quotes, and missing commas. ExposesextractJSON,parseJSON, andtryParseJSON(based on code from this repo).game.ts— AddsbuildVerbalSampleCotSystem()andcallSelectBestPrompt().callGeneratePrompt()now branches on theEXPERIMENTAL_VERBAL_SAMPLE_COT=1env flag; existing behavior is unchanged when the flag is absent.No existing behavior is affected without the flag.
Summary by CodeRabbit
New Features
Infrastructure