Skip to content

Adopt the new BranchStore API from liblloyal for high throughput multi-branch operations#16

Merged
lloyal-research merged 5 commits intomainfrom
feat/throughput
Feb 14, 2026
Merged

Adopt the new BranchStore API from liblloyal for high throughput multi-branch operations#16
lloyal-research merged 5 commits intomainfrom
feat/throughput

Conversation

@lloyal-research
Copy link
Copy Markdown
Contributor

  • Adopt the new BranchStore API from liblloyal for high throughput multi-branch operations
  • Expose per branch logit snapshot

  BranchStore packs N branches into a single llama_decode() via
  decode_each (commit) and decode_scatter (prefill), replacing per-branch
  GPU dispatches. Tuple API [branch, token(s)][] makes the binding
  structural rather than positional.

  Branch.getLogits() returns a Float32Array copy of the captured logits
  snapshot, enabling distributional analysis (entropy, KL divergence)
  from JS without crossing the sampling chain.

  Tests cover best-of-N, rehydrate+generate, produce→commit interop,
  mixed single/batched paths, getLogits→modelEntropy integration, and
  independent per-branch EOG with steer-forced termination.
SessionContext uses span-based DecodeEachItem/DecodeScatterItem and
  BranchStore-internal scratch buffers - removes _decodeScratch member.
Copilot AI review requested due to automatic review settings February 14, 2026 04:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adopts the updated BranchStore API from liblloyal to enable high-throughput batched multi-branch decoding, and exposes per-branch logits snapshots to JS consumers.

Changes:

  • Added BranchStore JS wrapper + exports, plus new internal N-API bindings for batched commit/prefill.
  • Exposed Branch.getLogits() via a new native _branchGetLogits() method.
  • Added an integration test suite covering common BranchStore workflows.

Reviewed changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
test/integration.js Adds end-to-end integration coverage for BranchStore workflows and per-branch logits snapshots.
src/SessionContext.hpp Declares new internal native methods for branch logits access and store operations.
src/SessionContext.cpp Implements _branchGetLogits, _storeCommit, _storePrefill, and updates decode API usage.
lib/index.js Exports BranchStore from the public JS entrypoint.
lib/index.d.ts Adds typings/docs for _branchGetLogits, _storeCommit/_storePrefill, Branch.getLogits(), and BranchStore.
lib/BranchStore.js Introduces the JS BranchStore wrapper over the internal native store API.
lib/Branch.js Adds Branch.getLogits() wrapper method.
liblloyal Updates the submodule to a commit that provides the new BranchStore-native functionality.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2543 to +2550
// Accept tokens into sampler penalty windows (CPU, per-branch)
for (uint32_t i = 0; i < n; i++) {
lloyal::branch::accept_token(items[i].handle, items[i].token, &_branchStore);
}

// Batched decode: one token per branch, single llama_decode dispatch
_branchStore.decode_each(items);

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In _storeCommit, accept_token mutates per-branch sampler/penalty state before the decode happens. If _branchStore.decode_each(items) errors/throws or fails internally, the commit becomes partial (penalty windows advanced, KV/logits not), leaving branches in an inconsistent state. Consider performing decode_each first and only calling accept_token after successful decode, or wrapping decode in error handling and ensuring state is not mutated unless the commit fully succeeds.

Suggested change
// Accept tokens into sampler penalty windows (CPU, per-branch)
for (uint32_t i = 0; i < n; i++) {
lloyal::branch::accept_token(items[i].handle, items[i].token, &_branchStore);
}
// Batched decode: one token per branch, single llama_decode dispatch
_branchStore.decode_each(items);
// Batched decode: one token per branch, single llama_decode dispatch.
// Perform decode first so that if it throws/fails, sampler/penalty state
// is not mutated and branches remain consistent.
_branchStore.decode_each(items);
// Accept tokens into sampler penalty windows (CPU, per-branch)
for (uint32_t i = 0; i < n; i++) {
lloyal::branch::accept_token(items[i].handle, items[i].token, &_branchStore);
}

Copilot uses AI. Check for mistakes.
Comment on lines +1046 to +1062
for (let step = 0; step < 10; step++) {
const live = branches.map(b => [b, b.produce()])
.filter(([, p]) => !p.isStop);
if (!live.length) break;
store.commit(live.map(([b, p]) => [b, p.token]));
}

// All branches should have valid perplexity (metrics tracked through _storeCommit)
const ppls = branches.map(b => b.perplexity);
console.log(` best-of-N perplexities: [${ppls.map(p => p.toFixed(2)).join(', ')}]`);
assert(ppls.every(p => isFinite(p) && p >= 1.0),
`best-of-N: all perplexities valid [${ppls.map(p => p.toFixed(2))}]`);

// Greedy forks from same root → same perplexity (sanity check)
assert(Math.abs(ppls[0] - ppls[1]) < 0.01,
`best-of-N: greedy forks have equal perplexity`);

Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These BranchStore integration tests assume the model won’t immediately produce an EOG/stop token after the prompt (e.g., if live.length becomes 0 at step 0). In that case, later assertions about perplexity validity/equality can become flaky across models/variants. To make the tests robust, consider explicitly steering away from EOG for the first token(s), or detect early-stop and skip/adjust the assertions when no commits occurred.

Suggested change
for (let step = 0; step < 10; step++) {
const live = branches.map(b => [b, b.produce()])
.filter(([, p]) => !p.isStop);
if (!live.length) break;
store.commit(live.map(([b, p]) => [b, p.token]));
}
// All branches should have valid perplexity (metrics tracked through _storeCommit)
const ppls = branches.map(b => b.perplexity);
console.log(` best-of-N perplexities: [${ppls.map(p => p.toFixed(2)).join(', ')}]`);
assert(ppls.every(p => isFinite(p) && p >= 1.0),
`best-of-N: all perplexities valid [${ppls.map(p => p.toFixed(2))}]`);
// Greedy forks from same root → same perplexity (sanity check)
assert(Math.abs(ppls[0] - ppls[1]) < 0.01,
`best-of-N: greedy forks have equal perplexity`);
let commitCount = 0;
for (let step = 0; step < 10; step++) {
const live = branches.map(b => [b, b.produce()])
.filter(([, p]) => !p.isStop);
if (!live.length) break;
store.commit(live.map(([b, p]) => [b, p.token]));
commitCount += 1;
}
// All branches should have valid perplexity (metrics tracked through _storeCommit)
const ppls = branches.map(b => b.perplexity);
console.log(` best-of-N perplexities: [${ppls.map(p => p.toFixed(2)).join(', ')}]`);
if (commitCount === 0) {
console.log(' best-of-N skipped: model stopped immediately (no tokens committed)');
} else {
assert(ppls.every(p => isFinite(p) && p >= 1.0),
`best-of-N: all perplexities valid [${ppls.map(p => p.toFixed(2))}]`);
// Greedy forks from same root → same perplexity (sanity check)
assert(Math.abs(ppls[0] - ppls[1]) < 0.01,
`best-of-N: greedy forks have equal perplexity`);
}

Copilot uses AI. Check for mistakes.
Comment on lines +12 to +20
commit(entries) {
const handles = [], tokens = [];
for (const [branch, token] of entries) {
if (branch.disposed) throw new Error('BranchStore.commit: branch is disposed');
handles.push(branch.handle);
tokens.push(token);
}
this._ctx._storeCommit(handles, tokens);
}
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BranchStore.commit()/prefill() are public JS entrypoints but currently rely on destructuring and downstream native validation for most input-shape errors. If a caller passes a non-iterable, malformed tuple, or wrong token types, the thrown error may be confusing. Consider adding lightweight upfront validation (e.g., Array.isArray(entries), tuple length checks, numeric token checks, and Array.isArray(tokens) for prefill) and throwing a TypeError with a clear message.

Copilot uses AI. Check for mistakes.
const float* logits = lloyal::branch::get_logits(handle, &_branchStore);

if (!logits) {
throw Napi::Error::New(env, "_branchGetLogits: no logits captured");
Copy link

Copilot AI Feb 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message _branchGetLogits: no logits captured doesn’t tell the caller how to resolve it. Since the intended fix is actionable, consider extending the message to indicate which operation captures logits (e.g., captureLogits() or decodeAndCapture*() / a store commit that captures), so consumers can correct usage quickly.

Suggested change
throw Napi::Error::New(env, "_branchGetLogits: no logits captured");
throw Napi::Error::New(env,
"_branchGetLogits: no logits captured; call a logits-capturing operation "
"(e.g. captureLogits, decodeAndCapture*, or a store commit that captures logits) "
"before calling _branchGetLogits");

Copilot uses AI. Check for mistakes.
@lloyal-research lloyal-research merged commit 5b04fd5 into main Feb 14, 2026
2 of 7 checks passed
@lloyal-research lloyal-research deleted the feat/throughput branch February 20, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants