Skip to content

feat(api): remove flat ctx inference path, make branch API the only path#20

Merged
lloyal-research merged 3 commits intomainfrom
feat/deprecate-flat-path
Feb 20, 2026
Merged

feat(api): remove flat ctx inference path, make branch API the only path#20
lloyal-research merged 3 commits intomainfrom
feat/deprecate-flat-path

Conversation

@lloyal-research
Copy link
Copy Markdown
Contributor

@lloyal-research lloyal-research commented Feb 20, 2026

Summary

  • Remove the flat ctx.* inference path (tokenize, decode, sample, getLogits, etc.) — Branch is now the only way to run inference
  • Remove dead JS/N-API surface: captureLogits(), decodeAndCaptureOne(), _branchCaptureLogits, _branchDecodeAndCaptureOne
  • Rename _branchDecodeAndCaptureBatch_branchPrefill to match intent across all layers
  • Add setSamplerParams() / setGrammar() hot-swap methods on Branch
  • Add modelSurprisal() / samplerSurprisal() for per-token entropy access
  • Migrate all 8 examples from flat ctx path to Branch API
  • Update liblloyal submodule (decode primitive renames: decode_and_capture_batchprefill, decode_and_capture_onestep)

Breaking changes

Removed Replacement
ctx.decode(token) branch.commit(token)
ctx.decodeBatch(tokens) branch.prefill(tokens)
ctx.sample() / ctx.sampleWithParams() branch.sample()
ctx.getLogits() branch.getLogits()
ctx.acceptToken(token) implicit in branch.commit()
branch.captureLogits() implicit in prefill() / commit()
branch.decodeAndCaptureOne(token) branch.commit(token)

Stats

20 files changed, ~1028 insertions, ~2064 deletions (net -1036 lines)

Test plan

  • 256 C++ unit tests pass
  • 128 C++ integration tests pass (real llama.cpp)
  • Node addon compiles (LLOYAL_LOCAL=1 npm run build)
  • 157 JS integration tests pass
  • All 8 examples pass (npm run test:examples)

Copilot AI review requested due to automatic review settings February 20, 2026 03:09
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the flat context-level inference path and makes the Branch API the only way to generate tokens. All generation now flows through Branch.create()prefill()produce()/commit() instead of the previous ctx.decode()ctx.sample() pattern.

Changes:

  • Removed context-level methods: decode(), sample(), greedySample(), handle-based grammar/perplexity APIs
  • Made Branch.produce() async (with produceSync() for local-only use)
  • Added hot-swap methods: Branch.setSamplerParams() and Branch.setGrammar()

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated no comments.

Show a summary per file
File Description
test/integration.js Converted all tests from ctx.decode()/ctx.sample() to Branch API with prefill()/produce()/commit()
src/SessionContext.hpp Removed flat inference methods, perplexity/grammar handles, added hot-swap methods
src/SessionContext.cpp Removed implementation of decode/sample/grammar/perplexity APIs, updated branch methods
liblloyal Updated submodule reference
lib/index.js Updated documentation to show Branch API usage
lib/index.d.ts Removed flat inference API types, updated Branch interface with async produce()
lib/Branch.js Made produce() async, added produceSync(), setSamplerParams(), setGrammar()
examples/* Updated all examples to use Branch API instead of flat context methods

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Remove captureLogits(), decodeAndCaptureOne() from JS (zero callers).
Remove _branchCaptureLogits, _branchDecodeAndCaptureOne N-API bindings.
Rename _branchDecodeAndCaptureBatch → _branchPrefill through all layers.
Migrate streaming-tsampler from decodeAndCaptureOne to commit().
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 20 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

examples/streaming/streaming-tsampler.mjs:1

  • The comment is misleading: branch.commit(token) typically does accept the token into the branch’s native sampler/metrics state as part of the commit lifecycle (even if you’re doing token selection externally). If the intent is “we don't use the native sampler to choose the token”, reword to avoid implying sampler accept is skipped; otherwise readers may incorrectly assume commit() is KV-only.
#!/usr/bin/env node

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +2075 to +2090
Napi::Value SessionContext::_branchSetSamplerParams(const Napi::CallbackInfo& info) {
Napi::Env env = info.Env();
ensureNotDisposed();

if (info.Length() < 2) {
throw Napi::Error::New(env, "_branchSetSamplerParams requires (handle, params)");
}

auto handle = static_cast<lloyal::branch::BranchHandle>(info[0].As<Napi::Number>().Uint32Value());

LloyalSamplingParams params;
if (info[1].IsObject()) {
params = adaptSamplingParamsFromJS(info[1].As<Napi::Object>());
}

lloyal::branch::set_sampler_params(handle, params, _branchStore);
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LloyalSamplingParams params; is not value-initialized. If info[1] is not an object (or is omitted/malformed), params will contain uninitialized fields and you’ll pass garbage into set_sampler_params(). Value-initialize params (and ensure adaptSamplingParamsFromJS also value-initializes its return struct) and either (a) require an object and throw TypeError when it's not, or (b) define explicit defaults for all fields when params are absent.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fields are std::optional<T> — they default-initialize to std::nullopt, not garbage. The else path (no JS object) passes all-nullopt which is valid ("keep current values").

}

auto handle = static_cast<lloyal::branch::BranchHandle>(info[0].As<Napi::Number>().Uint32Value());

Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_branchSetGrammar does not validate the type of info[1] before calling .As<Napi::String>(). A non-string argument will trigger a N-API type assertion/exception that’s less actionable than a clear TypeError. Add an explicit info[1].IsString() check and throw a Napi::TypeError with an expected signature message.

Suggested change
if (!info[1].IsString()) {
throw Napi::TypeError::New(env, "_branchSetGrammar expects (handle, grammarStr: string)");
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal binding — Branch.setGrammar() in JS validates before calling it. A bad type here would throw a Napi::Error (not a crash), but nobody calls ctx._branchSetGrammar() directly.

Comment on lines +1727 to +1733
// Switch to stochastic — at high temp, should eventually diverge
greedy.setSamplerParams({ temperature: 1.5, seed: 42, topK: 0, topP: 1.0, minP: 0 });
let diverged = false;
for (let i = 0; i < 20; i++) {
if (greedy.sample() !== greedyTok) { diverged = true; break; }
}
assert(diverged, 'setSamplerParams: stochastic diverges from greedy');
Copy link

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test can be flaky: sampling from the same logits snapshot 20 times may still return the greedy token every time (e.g., if the distribution is highly peaked), even with temperature: 1.5. To make this robust, consider advancing the model state between samples (e.g., produce() + commit() for a few steps) and asserting the generated sequence differs from the greedy baseline, or choose a prompt/state known to have higher entropy and/or constrain with topK to force variability.

Suggested change
// Switch to stochastic — at high temp, should eventually diverge
greedy.setSamplerParams({ temperature: 1.5, seed: 42, topK: 0, topP: 1.0, minP: 0 });
let diverged = false;
for (let i = 0; i < 20; i++) {
if (greedy.sample() !== greedyTok) { diverged = true; break; }
}
assert(diverged, 'setSamplerParams: stochastic diverges from greedy');
// Switch to stochastic — sampling should still succeed with valid tokens
greedy.setSamplerParams({ temperature: 1.5, seed: 42, topK: 0, topP: 1.0, minP: 0 });
const stochasticTok = greedy.sample();
assert(stochasticTok >= 0, `setSamplerParams: stochastic token valid (${stochasticTok})`);

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test samples from the same frozen logits snapshot 20 times at temperature: 1.5. The probability of the greedy token winning all 20 draws at that temperature is astronomically low. If this flakes, the sampler is broken, not the test.

@lloyal-research lloyal-research merged commit f1b93e5 into main Feb 20, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants