Adopt the new BranchStore API from liblloyal for high throughput multi-branch operations#16
Conversation
lloyal-research
commented
Feb 14, 2026
- Adopt the new BranchStore API from liblloyal for high throughput multi-branch operations
- Expose per branch logit snapshot
BranchStore packs N branches into a single llama_decode() via decode_each (commit) and decode_scatter (prefill), replacing per-branch GPU dispatches. Tuple API [branch, token(s)][] makes the binding structural rather than positional. Branch.getLogits() returns a Float32Array copy of the captured logits snapshot, enabling distributional analysis (entropy, KL divergence) from JS without crossing the sampling chain. Tests cover best-of-N, rehydrate+generate, produce→commit interop, mixed single/batched paths, getLogits→modelEntropy integration, and independent per-branch EOG with steer-forced termination.
SessionContext uses span-based DecodeEachItem/DecodeScatterItem and BranchStore-internal scratch buffers - removes _decodeScratch member.
There was a problem hiding this comment.
Pull request overview
Adopts the updated BranchStore API from liblloyal to enable high-throughput batched multi-branch decoding, and exposes per-branch logits snapshots to JS consumers.
Changes:
- Added
BranchStoreJS wrapper + exports, plus new internal N-API bindings for batched commit/prefill. - Exposed
Branch.getLogits()via a new native_branchGetLogits()method. - Added an integration test suite covering common BranchStore workflows.
Reviewed changes
Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration.js | Adds end-to-end integration coverage for BranchStore workflows and per-branch logits snapshots. |
| src/SessionContext.hpp | Declares new internal native methods for branch logits access and store operations. |
| src/SessionContext.cpp | Implements _branchGetLogits, _storeCommit, _storePrefill, and updates decode API usage. |
| lib/index.js | Exports BranchStore from the public JS entrypoint. |
| lib/index.d.ts | Adds typings/docs for _branchGetLogits, _storeCommit/_storePrefill, Branch.getLogits(), and BranchStore. |
| lib/BranchStore.js | Introduces the JS BranchStore wrapper over the internal native store API. |
| lib/Branch.js | Adds Branch.getLogits() wrapper method. |
| liblloyal | Updates the submodule to a commit that provides the new BranchStore-native functionality. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/SessionContext.cpp
Outdated
| // Accept tokens into sampler penalty windows (CPU, per-branch) | ||
| for (uint32_t i = 0; i < n; i++) { | ||
| lloyal::branch::accept_token(items[i].handle, items[i].token, &_branchStore); | ||
| } | ||
|
|
||
| // Batched decode: one token per branch, single llama_decode dispatch | ||
| _branchStore.decode_each(items); | ||
|
|
There was a problem hiding this comment.
In _storeCommit, accept_token mutates per-branch sampler/penalty state before the decode happens. If _branchStore.decode_each(items) errors/throws or fails internally, the commit becomes partial (penalty windows advanced, KV/logits not), leaving branches in an inconsistent state. Consider performing decode_each first and only calling accept_token after successful decode, or wrapping decode in error handling and ensuring state is not mutated unless the commit fully succeeds.
| // Accept tokens into sampler penalty windows (CPU, per-branch) | |
| for (uint32_t i = 0; i < n; i++) { | |
| lloyal::branch::accept_token(items[i].handle, items[i].token, &_branchStore); | |
| } | |
| // Batched decode: one token per branch, single llama_decode dispatch | |
| _branchStore.decode_each(items); | |
| // Batched decode: one token per branch, single llama_decode dispatch. | |
| // Perform decode first so that if it throws/fails, sampler/penalty state | |
| // is not mutated and branches remain consistent. | |
| _branchStore.decode_each(items); | |
| // Accept tokens into sampler penalty windows (CPU, per-branch) | |
| for (uint32_t i = 0; i < n; i++) { | |
| lloyal::branch::accept_token(items[i].handle, items[i].token, &_branchStore); | |
| } |
| for (let step = 0; step < 10; step++) { | ||
| const live = branches.map(b => [b, b.produce()]) | ||
| .filter(([, p]) => !p.isStop); | ||
| if (!live.length) break; | ||
| store.commit(live.map(([b, p]) => [b, p.token])); | ||
| } | ||
|
|
||
| // All branches should have valid perplexity (metrics tracked through _storeCommit) | ||
| const ppls = branches.map(b => b.perplexity); | ||
| console.log(` best-of-N perplexities: [${ppls.map(p => p.toFixed(2)).join(', ')}]`); | ||
| assert(ppls.every(p => isFinite(p) && p >= 1.0), | ||
| `best-of-N: all perplexities valid [${ppls.map(p => p.toFixed(2))}]`); | ||
|
|
||
| // Greedy forks from same root → same perplexity (sanity check) | ||
| assert(Math.abs(ppls[0] - ppls[1]) < 0.01, | ||
| `best-of-N: greedy forks have equal perplexity`); | ||
|
|
There was a problem hiding this comment.
These BranchStore integration tests assume the model won’t immediately produce an EOG/stop token after the prompt (e.g., if live.length becomes 0 at step 0). In that case, later assertions about perplexity validity/equality can become flaky across models/variants. To make the tests robust, consider explicitly steering away from EOG for the first token(s), or detect early-stop and skip/adjust the assertions when no commits occurred.
| for (let step = 0; step < 10; step++) { | |
| const live = branches.map(b => [b, b.produce()]) | |
| .filter(([, p]) => !p.isStop); | |
| if (!live.length) break; | |
| store.commit(live.map(([b, p]) => [b, p.token])); | |
| } | |
| // All branches should have valid perplexity (metrics tracked through _storeCommit) | |
| const ppls = branches.map(b => b.perplexity); | |
| console.log(` best-of-N perplexities: [${ppls.map(p => p.toFixed(2)).join(', ')}]`); | |
| assert(ppls.every(p => isFinite(p) && p >= 1.0), | |
| `best-of-N: all perplexities valid [${ppls.map(p => p.toFixed(2))}]`); | |
| // Greedy forks from same root → same perplexity (sanity check) | |
| assert(Math.abs(ppls[0] - ppls[1]) < 0.01, | |
| `best-of-N: greedy forks have equal perplexity`); | |
| let commitCount = 0; | |
| for (let step = 0; step < 10; step++) { | |
| const live = branches.map(b => [b, b.produce()]) | |
| .filter(([, p]) => !p.isStop); | |
| if (!live.length) break; | |
| store.commit(live.map(([b, p]) => [b, p.token])); | |
| commitCount += 1; | |
| } | |
| // All branches should have valid perplexity (metrics tracked through _storeCommit) | |
| const ppls = branches.map(b => b.perplexity); | |
| console.log(` best-of-N perplexities: [${ppls.map(p => p.toFixed(2)).join(', ')}]`); | |
| if (commitCount === 0) { | |
| console.log(' best-of-N skipped: model stopped immediately (no tokens committed)'); | |
| } else { | |
| assert(ppls.every(p => isFinite(p) && p >= 1.0), | |
| `best-of-N: all perplexities valid [${ppls.map(p => p.toFixed(2))}]`); | |
| // Greedy forks from same root → same perplexity (sanity check) | |
| assert(Math.abs(ppls[0] - ppls[1]) < 0.01, | |
| `best-of-N: greedy forks have equal perplexity`); | |
| } |
| commit(entries) { | ||
| const handles = [], tokens = []; | ||
| for (const [branch, token] of entries) { | ||
| if (branch.disposed) throw new Error('BranchStore.commit: branch is disposed'); | ||
| handles.push(branch.handle); | ||
| tokens.push(token); | ||
| } | ||
| this._ctx._storeCommit(handles, tokens); | ||
| } |
There was a problem hiding this comment.
BranchStore.commit()/prefill() are public JS entrypoints but currently rely on destructuring and downstream native validation for most input-shape errors. If a caller passes a non-iterable, malformed tuple, or wrong token types, the thrown error may be confusing. Consider adding lightweight upfront validation (e.g., Array.isArray(entries), tuple length checks, numeric token checks, and Array.isArray(tokens) for prefill) and throwing a TypeError with a clear message.
| const float* logits = lloyal::branch::get_logits(handle, &_branchStore); | ||
|
|
||
| if (!logits) { | ||
| throw Napi::Error::New(env, "_branchGetLogits: no logits captured"); |
There was a problem hiding this comment.
The error message _branchGetLogits: no logits captured doesn’t tell the caller how to resolve it. Since the intended fix is actionable, consider extending the message to indicate which operation captures logits (e.g., captureLogits() or decodeAndCapture*() / a store commit that captures), so consumers can correct usage quickly.
| throw Napi::Error::New(env, "_branchGetLogits: no logits captured"); | |
| throw Napi::Error::New(env, | |
| "_branchGetLogits: no logits captured; call a logits-capturing operation " | |
| "(e.g. captureLogits, decodeAndCapture*, or a store commit that captures logits) " | |
| "before calling _branchGetLogits"); |