Conversation
There was a problem hiding this comment.
Pull request overview
Integrates liblloyal’s new chat_in/chat_out APIs to provide format-aware chat prompt generation and structured output parsing (tools/reasoning), aligning behavior with llama.cpp chat edge cases.
Changes:
- Replaced
chat_templateusage withchat_informatting and addedchat_outparsing via newparseChatOutput()binding. - Expanded TypeScript typings to expose format-awareness metadata, parsing options/results, and related enums/types.
- Added integration tests for chat in/out round-tripping and new warm-vs-cold parity + semantic recall checks; updated CI matrix model.
Reviewed changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| test/matrix.json | Switches CI chat model entry to Ministral for updated chat template coverage. |
| test/integration.js | Adds new integration tests for chat_in/chat_out and warm continuation correctness. |
| src/SessionContext.hpp | Declares new parseChatOutput N-API method. |
| src/SessionContext.cpp | Implements chat_in::format plumbing, exposes extended format metadata, adds chat_out::parse binding. |
| liblloyal | Updates submodule to a revision containing chat_in/chat_out. |
| lib/index.d.ts | Adds format-aware chat result fields, parsing APIs/types, and richer docs. |
| examples/chat/chat.mjs | Formatting-only: normalizes quoting/style (no behavior change shown). |
| README.md | Updates listed CI model/template entry. |
| .gitignore | Adds ignore rule for docs/_internal. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
test/integration.js
Outdated
| assert(warmStr === coldStr, | ||
| `Warm==Cold parity: ${warmGen2.length} tokens match`); | ||
|
|
||
| if (warmStr !== coldStr) { | ||
| // Diagnostic: show first divergence point | ||
| for (let i = 0; i < Math.max(warmGen2.length, coldGen2.length); i++) { | ||
| if (warmGen2[i] !== coldGen2[i]) { | ||
| console.log(` First divergence at position ${i}: warm=${warmGen2[i]} cold=${coldGen2[i]}`); | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The diagnostic block is unreachable on mismatch because assert() throws before the if (warmStr !== coldStr) runs, and the assert message is misleading (it claims tokens match even when they don’t). Move the divergence logging before throwing, and make the failure message include at least the first differing index (or the two token sequences) so failures are actionable.
| assert(warmStr === coldStr, | |
| `Warm==Cold parity: ${warmGen2.length} tokens match`); | |
| if (warmStr !== coldStr) { | |
| // Diagnostic: show first divergence point | |
| for (let i = 0; i < Math.max(warmGen2.length, coldGen2.length); i++) { | |
| if (warmGen2[i] !== coldGen2[i]) { | |
| console.log(` First divergence at position ${i}: warm=${warmGen2[i]} cold=${coldGen2[i]}`); | |
| break; | |
| } | |
| } | |
| } | |
| let firstDiffIdx = -1; | |
| if (warmStr !== coldStr) { | |
| // Diagnostic: show first divergence point | |
| for (let i = 0; i < Math.max(warmGen2.length, coldGen2.length); i++) { | |
| if (warmGen2[i] !== coldGen2[i]) { | |
| firstDiffIdx = i; | |
| console.log(` First divergence at position ${i}: warm=${warmGen2[i]} cold=${coldGen2[i]}`); | |
| break; | |
| } | |
| } | |
| } | |
| assert( | |
| warmStr === coldStr, | |
| firstDiffIdx >= 0 | |
| ? `Warm==Cold parity failed at token ${firstDiffIdx}: warm=${warmGen2[firstDiffIdx]} cold=${coldGen2[firstDiffIdx]}` | |
| : `Warm==Cold parity failed: warm sequence=${warmStr} cold sequence=${coldStr}` | |
| ); |
test/integration.js
Outdated
| async function warmTurn(messages, lastText, userContent) { | ||
| messages.push({ role: 'user', content: userContent }); | ||
| const { prompt: fullPrompt } = await ctx.formatChat(JSON.stringify(messages)); | ||
| const delta = fullPrompt.slice(lastText.length); | ||
| const deltaToks = await ctx.tokenize(delta); | ||
| branch.prefill(deltaToks); | ||
|
|
||
| const gen = []; | ||
| for (let i = 0; i < GEN_TOKENS; i++) { | ||
| const { token, isStop } = branch.produce(); | ||
| if (isStop) break; | ||
| branch.commit(token); | ||
| gen.push(token); | ||
| } | ||
| const assistantText = await ctx.detokenize(gen); | ||
| messages.push({ role: 'assistant', content: assistantText }); | ||
| return { text: assistantText, lastText: fullPrompt + assistantText }; | ||
| } |
There was a problem hiding this comment.
warmTurn() closes over branch, but branch is declared later with var, relying on hoisting and call-order for correctness. This is fragile and harder to read/debug; declare let branch; before warmTurn and assign it before any call sites, or pass branch explicitly as a parameter to warmTurn() (and prefer const/let over var).
test/integration.js
Outdated
| const promptToks = await ctx.tokenize(prompt); | ||
| await ctx.decode(promptToks, 0, 0); | ||
|
|
||
| var branch = Branch.create(ctx, 0, promptToks.length, { temperature: 0 }); |
There was a problem hiding this comment.
warmTurn() closes over branch, but branch is declared later with var, relying on hoisting and call-order for correctness. This is fragile and harder to read/debug; declare let branch; before warmTurn and assign it before any call sites, or pass branch explicitly as a parameter to warmTurn() (and prefer const/let over var).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 11 out of 13 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Returns: Promise<{ prompt: string, stopTokens: string[] }> | ||
| */ | ||
| Napi::Value formatChat(const Napi::CallbackInfo& info); |
There was a problem hiding this comment.
The formatChat doc comment is now outdated: the implementation returns many additional fields (format, grammar, grammarLazy, reasoningFormat, etc.), not just { prompt, stopTokens }. Please update this comment to reflect the extended return shape, and add a brief doc comment describing parseChatOutput’s arguments/return fields since it’s a new public binding.
| * Returns: Promise<{ prompt: string, stopTokens: string[] }> | |
| */ | |
| Napi::Value formatChat(const Napi::CallbackInfo& info); | |
| * Returns: Promise<object> resolving to an object containing: | |
| * - prompt: string | |
| * - stopTokens: number[] | |
| * - format?: string | |
| * - grammar?: string | |
| * - grammarLazy?: () => string | |
| * - reasoningFormat?: string | |
| * - and other model-specific formatting/grammar metadata | |
| */ | |
| Napi::Value formatChat(const Napi::CallbackInfo& info); | |
| /** | |
| * Parse raw model chat output using the metadata returned by formatChat. | |
| * Args: info (NAPI call info; typically includes the raw completion text and | |
| * associated formatting/grammar metadata). | |
| * Returns: JavaScript object representing the parsed chat response, including | |
| * assistant content and related metadata. | |
| */ |
|
|
||
| if (info.Length() < 1 || !info[0].IsString()) { | ||
| throw Napi::TypeError::New(env, "Expected (messagesJson: string[, templateOverride: string])"); | ||
| throw Napi::TypeError::New(env, "Expected (messagesJson: string[, options: object])"); |
There was a problem hiding this comment.
The updated error message no longer reflects the actual supported signature: the implementation still supports a string second argument for backward compatibility (formatChat(messagesJson, templateOverride: string)). Consider updating the message to mention both accepted forms so callers can recover quickly from argument errors.
| throw Napi::TypeError::New(env, "Expected (messagesJson: string[, options: object])"); | |
| throw Napi::TypeError::New( | |
| env, | |
| "Expected (messagesJson: string[, options: object]) or (messagesJson: string, templateOverride: string)"); |
| // Second argument: options object (or string for backward compat) | ||
| if (info.Length() >= 2) { | ||
| if (info[1].IsString()) { | ||
| // Backward compat: formatChat(messagesJson, templateOverride) | ||
| inputs.template_override = info[1].As<Napi::String>().Utf8Value(); | ||
| } else if (info[1].IsObject()) { | ||
| Napi::Object opts = info[1].As<Napi::Object>(); |
There was a problem hiding this comment.
If a second argument is provided but is neither a string nor an object (e.g. null, number, boolean), it is silently ignored. That can hide calling bugs and lead to surprising behavior. It’d be better to throw a TypeError when info[1] is present but not one of the supported types.
| } | ||
| if (opts.Has("addGenerationPrompt") && opts.Get("addGenerationPrompt").IsBoolean()) { | ||
| inputs.add_generation_prompt = opts.Get("addGenerationPrompt").As<Napi::Boolean>().Value(); | ||
| } |
There was a problem hiding this comment.
If a second argument is provided but is neither a string nor an object (e.g. null, number, boolean), it is silently ignored. That can hide calling bugs and lead to surprising behavior. It’d be better to throw a TypeError when info[1] is present but not one of the supported types.
| } | |
| } | |
| } else { | |
| throw Napi::TypeError::New(env, "Expected options to be a string or object"); |
| if (opts.Has("reasoningFormat") && opts.Get("reasoningFormat").IsString()) { | ||
| inputs.reasoning_format = opts.Get("reasoningFormat").As<Napi::String>().Utf8Value(); | ||
| } |
There was a problem hiding this comment.
FormatChatOptions.reasoningFormat is modeled as a string in the binding, but you also expose a numeric ReasoningFormat enum in the TS types. A common user expectation will be to pass ReasoningFormat.AUTO (number) into formatChat, but this implementation will ignore it because it only accepts strings. Consider accepting both number and string inputs here (or adjusting the TS surface to avoid offering an enum that isn’t accepted on input).
test/integration.js
Outdated
| const turn3 = await warmTurn(branch, 'Do you remember my name?'); | ||
| console.log(` Turn 3 (name recall): "${turn3.trim().slice(0, 80)}"`); | ||
| const nameRecalled = turn3.toLowerCase().includes('lloyal'); | ||
| assert(nameRecalled, `Name recall: ${nameRecalled ? 'found "Lloyal"' : 'MISSING "Lloyal" in: ' + turn3.trim().slice(0, 120)}`); | ||
|
|
||
| // Turn 4 (WARM): recall food | ||
| const turn4 = await warmTurn(branch, 'Do you remember my favourite food?'); | ||
| console.log(` Turn 4 (food recall): "${turn4.trim().slice(0, 80)}"`); | ||
| const foodRecalled = turn4.toLowerCase().includes('pizza'); | ||
| assert(foodRecalled, `Food recall: ${foodRecalled ? 'found "pizza"' : 'MISSING "pizza" in: ' + turn4.trim().slice(0, 120)}`); |
There was a problem hiding this comment.
These assertions make the integration test brittle: even at temperature: 0, a model can “remember” without repeating the exact string (e.g., “Yes, you told me earlier” or paraphrasing), causing false failures across models/templates. To reduce flakiness, consider tightening the prompts to require an exact minimal answer (e.g., “Reply with only the name.” / “Reply with only the food.”) or relaxing the checks to allow common variants while still proving recall.
| const turn3 = await warmTurn(branch, 'Do you remember my name?'); | |
| console.log(` Turn 3 (name recall): "${turn3.trim().slice(0, 80)}"`); | |
| const nameRecalled = turn3.toLowerCase().includes('lloyal'); | |
| assert(nameRecalled, `Name recall: ${nameRecalled ? 'found "Lloyal"' : 'MISSING "Lloyal" in: ' + turn3.trim().slice(0, 120)}`); | |
| // Turn 4 (WARM): recall food | |
| const turn4 = await warmTurn(branch, 'Do you remember my favourite food?'); | |
| console.log(` Turn 4 (food recall): "${turn4.trim().slice(0, 80)}"`); | |
| const foodRecalled = turn4.toLowerCase().includes('pizza'); | |
| assert(foodRecalled, `Food recall: ${foodRecalled ? 'found "pizza"' : 'MISSING "pizza" in: ' + turn4.trim().slice(0, 120)}`); | |
| const turn3 = await warmTurn(branch, 'Do you remember my name? Reply with only my name.'); | |
| console.log(` Turn 3 (name recall): "${turn3.trim().slice(0, 80)}"`); | |
| const nameAnswer = turn3.trim().toLowerCase(); | |
| const nameRecalled = nameAnswer === 'lloyal'; | |
| assert(nameRecalled, `Name recall: expected "Lloyal", got "${nameAnswer}"`); | |
| // Turn 4 (WARM): recall food | |
| const turn4 = await warmTurn(branch, 'Do you remember my favourite food? Reply with only my favourite food.'); | |
| console.log(` Turn 4 (food recall): "${turn4.trim().slice(0, 80)}"`); | |
| const foodAnswer = turn4.trim().toLowerCase(); | |
| const foodRecalled = foodAnswer === 'pizza'; | |
| assert(foodRecalled, `Food recall: expected "pizza", got "${foodAnswer}"`); |
Integrated the new liblloyal::chat_in and liblloyal::chat_out apis for parity with llama.cpp chat edges