feat: add experimental verbal sampling CoT prompt generation by zeno205 · Pull Request #36 · T3-Content/quipslop

zeno205 · 2026-02-23T02:36:57Z

Verbalized Sampling CoT prompt generation experiment

Adds an additional prompt generation strategy where the model generates 5 candidate prompts with a verbal chain-of-thought reasoning step, then uses a second model call to select the best one (Inspired by this paper).

Changes

llm-json-fixer.ts (new): Standalone JSON extraction/repair utility for LLM responses. Handles common failure modes: markdown fences, trailing text, unescaped quotes, and missing commas. Exposes extractJSON, parseJSON, and tryParseJSON (based on code from this repo).
game.ts — Adds buildVerbalSampleCotSystem() and callSelectBestPrompt(). callGeneratePrompt() now branches on the EXPERIMENTAL_VERBAL_SAMPLE_COT=1 env flag; existing behavior is unchanged when the flag is absent.

No existing behavior is affected without the flag.

Summary by CodeRabbit

New Features
- Introduced experimental Chain-Of-Thought prompt generation that creates multiple prompt candidates and automatically selects the best option when enabled. Standard single-prompt generation remains available as default.
Infrastructure
- Added robust JSON parsing with automated error correction to improve handling of AI-generated responses.

coderabbitai · 2026-02-23T02:37:17Z

📝 Walkthrough

Walkthrough

A new experimental Chain-Of-Thought prompt generation feature is introduced, which generates multiple JSON-formatted prompt candidates, parses them with error correction, validates them, and selects the best one. A complementary JSON parsing utility module is added to handle automated fixes for common JSON formatting issues.

Changes

Cohort / File(s)	Summary
Chain-of-Thought Prompt Generation `game.ts`	Adds experimental EXPERIMENTAL_VERBAL_SAMPLE_COT feature that generates 5 JSON-formatted prompts via LLM, parses and validates them, then selects the best candidate. Introduces new `callSelectBestPrompt` export and integrates JSON parsing utility. Maintains backward compatibility with existing single-prompt flow when disabled.
JSON Parsing Utility `llm-json-fixer.ts`	New module providing robust JSON parsing with automatic error recovery. Includes helpers for extracting JSON from markdown, removing trailing content, fixing unescaped quotes, and adding missing commas. Public API exports `tryParseJSON`, `parseJSON`, and `extractJSON` with diagnostic reporting.

Sequence Diagram

sequenceDiagram
    actor User
    participant callGeneratePrompt
    participant LLM
    participant tryParseJSON
    participant callSelectBestPrompt
    participant callSelectBestPrompt2 as LLM<br/>(Selection)

    User->>callGeneratePrompt: Request prompt (CoT enabled)
    callGeneratePrompt->>LLM: Generate 5 JSON-formatted prompts
    LLM-->>callGeneratePrompt: Raw JSON response
    callGeneratePrompt->>tryParseJSON: Parse JSON with fixes
    tryParseJSON-->>callGeneratePrompt: Parsed prompts + fixes applied
    callGeneratePrompt->>callSelectBestPrompt: Select best prompt
    callSelectBestPrompt->>callSelectBestPrompt2: LLM selection request
    callSelectBestPrompt2-->>callSelectBestPrompt: Selected prompt
    callSelectBestPrompt-->>callGeneratePrompt: Best prompt
    callGeneratePrompt-->>User: Final prompt

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A JSON chain of thought so bright,
Five prompts parsed with all our might,
Commas fixed and quotes aligned,
The very best prompt we will find!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add experimental verbal sampling CoT prompt generation' clearly and accurately describes the main change: introducing an experimental verbal sampling chain-of-thought approach for prompt generation. It is specific, concise, and directly reflects the primary functionality added in this changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

macroscopeapp · 2026-02-23T02:37:37Z

Add experimental verbal sampling CoT prompt generation and branch `game.callGeneratePrompt` to request 5 JSON-formatted Quiplash prompts with probabilities when `EXPERIMENTAL_VERBAL_SAMPLE_COT='1'`

Introduce a feature flag that switches prompt generation to a JSON-based, 5-candidate flow with a selection step, and add tolerant JSON parsing for model responses.

📍Where to Start

Start with game.callGeneratePrompt in game.ts, then review buildVerbalSampleCotSystem and game.callSelectBestPrompt, followed by JSON parsing in llm-json-fixer.ts.

Macroscope summarized 6b4ef7f.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

game.ts (1)

321-345: callSelectBestPrompt failure causes full re-generation of candidates.

If this API call throws (network error, rate limit, etc.), the error propagates up through callGeneratePrompt, which is wrapped in withRetry. This causes the entire 5-candidate generation to be re-executed. Consider catching the error here and falling back to the probability-based selection directly, since valid candidates already exist.

♻️ Suggested approach in callGeneratePrompt

-  const selected = await callSelectBestPrompt(
-    model,
-    candidates.map((c) => c.joke),
-  );
+  let selected: string | null = null;
+  try {
+    selected = await callSelectBestPrompt(
+      model,
+      candidates.map((c) => c.joke),
+    );
+  } catch (err) {
+    log("WARN", `prompt:${model.name}`, "callSelectBestPrompt failed, using fallback", {
+      error: err instanceof Error ? err.message : String(err),
+    });
+  }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@game.ts` around lines 321 - 345, callSelectBestPrompt currently lets errors
from generateText bubble up (causing callGeneratePrompt wrapped in withRetry to
regenerate all candidates); modify callSelectBestPrompt to catch exceptions
thrown by generateText (or any part of the API call), log the error via
log("ERROR", ...) including model.id/name and the exception, and then return a
deterministic fallback choice (e.g., pick the highest-probability candidate by
index from the provided jokes array or use an existing probability-based
selector) so callers still receive a valid string; ensure the returned value is
passed through cleanResponse before returning and preserve types/signature of
callSelectBestPrompt.

llm-json-fixer.ts (1)

252-269: Redundant parse attempt when a fix is a no-op.

When a fix function returns the same string (line 265–267), processed isn't updated, but the loop still runs JSON.parse again on the unchanged input in the next iteration, producing a duplicate warning. Consider skipping the iteration when the fix is a no-op.

♻️ Suggested improvement

       warnings.push((error as Error).message);
       if (i === attempts.length) {
         break;
       }
       const next = attempts[i]!();
       if (next !== processed) {
         processed = next;
+      } else {
+        continue;
       }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@llm-json-fixer.ts` around lines 252 - 269, The loop currently pushes the same
JSON.parse error repeatedly when a fix is a no-op; in the catch block, only push
the error message to warnings if it differs from the last warning (e.g., check
warnings[warnings.length-1] !== (error as Error).message before pushing), and
after computing next = attempts[i]!(), if next === processed simply continue to
the next attempt without updating processed (so you avoid redundant parse
attempts and duplicate warnings). Update references in this block around
attempts, processed, warnings and the catch handling to implement these checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@game.ts`:
- Around line 267-269: The strict check throws when parsed.jokes.length !== 5;
relax this to allow variable-length LLM output by validating parsed.jokes is an
array and has at least one item (e.g., parsed.jokes.length < 1) instead of
requiring exactly 5; reference the existing downstream validation in the
mapping/filtering logic around the jokes processing (the map/filter block
following parsed.jokes) to ensure invalid entries are still discarded and at
least one valid candidate is present.
- Around line 297-305: callSelectBestPrompt returns model text that may not
exactly match candidate.joke, so the exact equality check (selected === c.joke)
in the matched lookup is fragile; update selection handling in game.ts to
robustly map the model output to a candidate by either (A) changing the prompt
in callSelectBestPrompt to instruct the model to return a single index and then
parse that index to pick from candidates, or (B) perform a normalized fuzzy
match after receiving selected — e.g., trim/normalize whitespace and punctuation
and compare lowercased strings or use a small Levenshtein/similarity threshold
to find the closest candidate.joke before falling back to the probability-based
path; apply this logic where matched is computed so matched.joke reliably
resolves to the intended candidate.

In `@llm-json-fixer.ts`:
- Around line 193-197: The depth counter currently increments/decrements for
every brace/brace character in the raw line (variables depth, line, char), which
miscounts when those characters appear inside JSON string values; update the
loop to track whether we're inside a quoted string (e.g., an inString boolean
toggled when encountering an unescaped double-quote, handling backslash escapes)
and only modify depth for "{" "[" "}" "]" when inString is false; apply the same
change to the other matching loop referenced around lines 211-214 so brackets
inside strings do not affect depth.

---

Nitpick comments:
In `@game.ts`:
- Around line 321-345: callSelectBestPrompt currently lets errors from
generateText bubble up (causing callGeneratePrompt wrapped in withRetry to
regenerate all candidates); modify callSelectBestPrompt to catch exceptions
thrown by generateText (or any part of the API call), log the error via
log("ERROR", ...) including model.id/name and the exception, and then return a
deterministic fallback choice (e.g., pick the highest-probability candidate by
index from the provided jokes array or use an existing probability-based
selector) so callers still receive a valid string; ensure the returned value is
passed through cleanResponse before returning and preserve types/signature of
callSelectBestPrompt.

In `@llm-json-fixer.ts`:
- Around line 252-269: The loop currently pushes the same JSON.parse error
repeatedly when a fix is a no-op; in the catch block, only push the error
message to warnings if it differs from the last warning (e.g., check
warnings[warnings.length-1] !== (error as Error).message before pushing), and
after computing next = attempts[i]!(), if next === processed simply continue to
the next attempt without updating processed (so you avoid redundant parse
attempts and duplicate warnings). Update references in this block around
attempts, processed, warnings and the catch handling to implement these checks.

coderabbitai · 2026-02-23T02:41:35Z

game.ts

+  if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) {
+    throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items");
+  }


⚠️ Potential issue | 🟠 Major

Strict === 5 check is brittle for LLM output.

LLMs don't always follow instructions exactly — they may return 4 or 6 items. The downstream map/filter at lines 271–295 already discards invalid entries and checks for at least one valid candidate. Consider relaxing this to a minimum-length check (e.g., < 1) rather than requiring exactly 5.

♻️ Suggested change

- if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) { - throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items"); + if (!Array.isArray(parsed.jokes) || parsed.jokes.length === 0) { + throw new Error("Invalid verbal sample CoT output: jokes array is empty or missing"); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if (!Array.isArray(parsed.jokes) || parsed.jokes.length !== 5) {

throw new Error("Invalid verbal sample CoT output: jokes must contain 5 items");

}

if (!Array.isArray(parsed.jokes) || parsed.jokes.length === 0) {

throw new Error("Invalid verbal sample CoT output: jokes array is empty or missing");

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@game.ts` around lines 267 - 269, The strict check throws when parsed.jokes.length !== 5; relax this to allow variable-length LLM output by validating parsed.jokes is an array and has at least one item (e.g., parsed.jokes.length < 1) instead of requiring exactly 5; reference the existing downstream validation in the mapping/filtering logic around the jokes processing (the map/filter block following parsed.jokes) to ensure invalid entries are still discarded and at least one valid candidate is present.

coderabbitai · 2026-02-23T02:41:35Z

game.ts

+  const selected = await callSelectBestPrompt(
+    model,
+    candidates.map((c) => c.joke),
+  );
+
+  const matched = candidates.find((c) => c.joke === selected);
+  if (matched) {
+    return matched.joke;
+  }


⚠️ Potential issue | 🟠 Major

Exact-match comparison with LLM output is fragile.

callSelectBestPrompt asks the model to echo back the chosen prompt, then line 302 does an exact string match against the candidates. LLMs frequently introduce minor deviations — a leading number, trailing period, extra whitespace, or slight rewording — causing the match to silently fail and fall through to the probability-based fallback every time.

Consider a fuzzy match (e.g., normalized/trimmed comparison, or includes/Levenshtein) or have the model return just the index number instead:

♻️ Option A: Have the model return the index

- prompt: `Choose exactly one of these Quiplash prompts and reply with ONLY the exact prompt text, nothing else:\n\n${jokes + prompt: `Choose exactly one of these Quiplash prompts and reply with ONLY the number (1-${jokes.length}), nothing else:\n\n${jokes .map((joke, i) => `${i + 1}. ${joke}`) .join("\n")}`,

Then parse the returned number and index into the candidates array.

♻️ Option B: Normalize before matching

- const matched = candidates.find((c) => c.joke === selected); + const normalize = (s: string) => s.replace(/^\d+[\.\)]\s*/, "").trim().toLowerCase(); + const normalizedSelected = normalize(selected); + const matched = candidates.find((c) => normalize(c.joke) === normalizedSelected);

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const selected = await callSelectBestPrompt(

model,

candidates.map((c) => c.joke),

);

const matched = candidates.find((c) => c.joke === selected);

if (matched) {

return matched.joke;

}

const selected = await callSelectBestPrompt(

model,

candidates.map((c) => c.joke),

);

const normalize = (s: string) => s.replace(/^\d+[\.\)]\s*/, "").trim().toLowerCase();

const normalizedSelected = normalize(selected);

const matched = candidates.find((c) => normalize(c.joke) === normalizedSelected);

if (matched) {

return matched.joke;

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@game.ts` around lines 297 - 305, callSelectBestPrompt returns model text that may not exactly match candidate.joke, so the exact equality check (selected === c.joke) in the matched lookup is fragile; update selection handling in game.ts to robustly map the model output to a candidate by either (A) changing the prompt in callSelectBestPrompt to instruct the model to return a single index and then parse that index to pick from candidates, or (B) perform a normalized fuzzy match after receiving selected — e.g., trim/normalize whitespace and punctuation and compare lowercased strings or use a small Levenshtein/similarity threshold to find the closest candidate.joke before falling back to the probability-based path; apply this logic where matched is computed so matched.joke reliably resolves to the intended candidate.

coderabbitai · 2026-02-23T02:41:35Z

llm-json-fixer.ts

+      for (const char of line) {
+        if (char === "{" || char === "[") depth++;
+        else if (char === "}" || char === "]") depth--;
+      }
+      continue;


⚠️ Potential issue | 🟡 Minor

Depth tracking counts brackets inside string values.

The depth counter iterates raw characters, so brackets within JSON string values (e.g., "What {thing}...") will incorrectly adjust depth, potentially causing spurious comma insertion or suppression. For the current use case (simple joke objects) this is unlikely to trigger, but worth noting as a known limitation.

Also applies to: 211-214

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@llm-json-fixer.ts` around lines 193 - 197, The depth counter currently increments/decrements for every brace/brace character in the raw line (variables depth, line, char), which miscounts when those characters appear inside JSON string values; update the loop to track whether we're inside a quoted string (e.g., an inString boolean toggled when encountering an unescaped double-quote, handling backslash escapes) and only modify depth for "{" "[" "}" "]" when inString is false; apply the same change to the other matching loop referenced around lines 211-214 so brackets inside strings do not affect depth.

feat: add experimental verbal sampling CoT prompt generation

6b4ef7f

coderabbitai bot reviewed Feb 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add experimental verbal sampling CoT prompt generation#36

feat: add experimental verbal sampling CoT prompt generation#36
zeno205 wants to merge 1 commit intoT3-Content:mainfrom
zeno205:feat/verbal-sample-cot

zeno205 commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

macroscopeapp bot commented Feb 23, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

coderabbitai bot Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zeno205 commented Feb 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verbalized Sampling CoT prompt generation experiment

Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

macroscopeapp bot commented Feb 23, 2026

Add experimental verbal sampling CoT prompt generation and branch game.callGeneratePrompt to request 5 JSON-formatted Quiplash prompts with probabilities when EXPERIMENTAL_VERBAL_SAMPLE_COT='1'

📍Where to Start

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

zeno205 commented Feb 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 23, 2026 •

edited

Loading

Add experimental verbal sampling CoT prompt generation and branch `game.callGeneratePrompt` to request 5 JSON-formatted Quiplash prompts with probabilities when `EXPERIMENTAL_VERBAL_SAMPLE_COT='1'`