feat(api): remove flat ctx inference path, make branch API the only path#20
feat(api): remove flat ctx inference path, make branch API the only path#20lloyal-research merged 3 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR removes the flat context-level inference path and makes the Branch API the only way to generate tokens. All generation now flows through Branch.create() → prefill() → produce()/commit() instead of the previous ctx.decode() → ctx.sample() pattern.
Changes:
- Removed context-level methods:
decode(),sample(),greedySample(), handle-based grammar/perplexity APIs - Made
Branch.produce()async (withproduceSync()for local-only use) - Added hot-swap methods:
Branch.setSamplerParams()andBranch.setGrammar()
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| test/integration.js | Converted all tests from ctx.decode()/ctx.sample() to Branch API with prefill()/produce()/commit() |
| src/SessionContext.hpp | Removed flat inference methods, perplexity/grammar handles, added hot-swap methods |
| src/SessionContext.cpp | Removed implementation of decode/sample/grammar/perplexity APIs, updated branch methods |
| liblloyal | Updated submodule reference |
| lib/index.js | Updated documentation to show Branch API usage |
| lib/index.d.ts | Removed flat inference API types, updated Branch interface with async produce() |
| lib/Branch.js | Made produce() async, added produceSync(), setSamplerParams(), setGrammar() |
| examples/* | Updated all examples to use Branch API instead of flat context methods |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Remove captureLogits(), decodeAndCaptureOne() from JS (zero callers). Remove _branchCaptureLogits, _branchDecodeAndCaptureOne N-API bindings. Rename _branchDecodeAndCaptureBatch → _branchPrefill through all layers. Migrate streaming-tsampler from decodeAndCaptureOne to commit().
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 20 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
examples/streaming/streaming-tsampler.mjs:1
- The comment is misleading:
branch.commit(token)typically does accept the token into the branch’s native sampler/metrics state as part of the commit lifecycle (even if you’re doing token selection externally). If the intent is “we don't use the native sampler to choose the token”, reword to avoid implying sampler accept is skipped; otherwise readers may incorrectly assumecommit()is KV-only.
#!/usr/bin/env node
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Napi::Value SessionContext::_branchSetSamplerParams(const Napi::CallbackInfo& info) { | ||
| Napi::Env env = info.Env(); | ||
| ensureNotDisposed(); | ||
|
|
||
| if (info.Length() < 2) { | ||
| throw Napi::Error::New(env, "_branchSetSamplerParams requires (handle, params)"); | ||
| } | ||
|
|
||
| auto handle = static_cast<lloyal::branch::BranchHandle>(info[0].As<Napi::Number>().Uint32Value()); | ||
|
|
||
| LloyalSamplingParams params; | ||
| if (info[1].IsObject()) { | ||
| params = adaptSamplingParamsFromJS(info[1].As<Napi::Object>()); | ||
| } | ||
|
|
||
| lloyal::branch::set_sampler_params(handle, params, _branchStore); |
There was a problem hiding this comment.
LloyalSamplingParams params; is not value-initialized. If info[1] is not an object (or is omitted/malformed), params will contain uninitialized fields and you’ll pass garbage into set_sampler_params(). Value-initialize params (and ensure adaptSamplingParamsFromJS also value-initializes its return struct) and either (a) require an object and throw TypeError when it's not, or (b) define explicit defaults for all fields when params are absent.
There was a problem hiding this comment.
All fields are std::optional<T> — they default-initialize to std::nullopt, not garbage. The else path (no JS object) passes all-nullopt which is valid ("keep current values").
| } | ||
|
|
||
| auto handle = static_cast<lloyal::branch::BranchHandle>(info[0].As<Napi::Number>().Uint32Value()); | ||
|
|
There was a problem hiding this comment.
_branchSetGrammar does not validate the type of info[1] before calling .As<Napi::String>(). A non-string argument will trigger a N-API type assertion/exception that’s less actionable than a clear TypeError. Add an explicit info[1].IsString() check and throw a Napi::TypeError with an expected signature message.
| if (!info[1].IsString()) { | |
| throw Napi::TypeError::New(env, "_branchSetGrammar expects (handle, grammarStr: string)"); | |
| } |
There was a problem hiding this comment.
This is an internal binding — Branch.setGrammar() in JS validates before calling it. A bad type here would throw a Napi::Error (not a crash), but nobody calls ctx._branchSetGrammar() directly.
| // Switch to stochastic — at high temp, should eventually diverge | ||
| greedy.setSamplerParams({ temperature: 1.5, seed: 42, topK: 0, topP: 1.0, minP: 0 }); | ||
| let diverged = false; | ||
| for (let i = 0; i < 20; i++) { | ||
| if (greedy.sample() !== greedyTok) { diverged = true; break; } | ||
| } | ||
| assert(diverged, 'setSamplerParams: stochastic diverges from greedy'); |
There was a problem hiding this comment.
This test can be flaky: sampling from the same logits snapshot 20 times may still return the greedy token every time (e.g., if the distribution is highly peaked), even with temperature: 1.5. To make this robust, consider advancing the model state between samples (e.g., produce() + commit() for a few steps) and asserting the generated sequence differs from the greedy baseline, or choose a prompt/state known to have higher entropy and/or constrain with topK to force variability.
| // Switch to stochastic — at high temp, should eventually diverge | |
| greedy.setSamplerParams({ temperature: 1.5, seed: 42, topK: 0, topP: 1.0, minP: 0 }); | |
| let diverged = false; | |
| for (let i = 0; i < 20; i++) { | |
| if (greedy.sample() !== greedyTok) { diverged = true; break; } | |
| } | |
| assert(diverged, 'setSamplerParams: stochastic diverges from greedy'); | |
| // Switch to stochastic — sampling should still succeed with valid tokens | |
| greedy.setSamplerParams({ temperature: 1.5, seed: 42, topK: 0, topP: 1.0, minP: 0 }); | |
| const stochasticTok = greedy.sample(); | |
| assert(stochasticTok >= 0, `setSamplerParams: stochastic token valid (${stochasticTok})`); |
There was a problem hiding this comment.
The test samples from the same frozen logits snapshot 20 times at temperature: 1.5. The probability of the greedy token winning all 20 draws at that temperature is astronomically low. If this flakes, the sampler is broken, not the test.
Summary
ctx.*inference path (tokenize,decode,sample,getLogits, etc.) —Branchis now the only way to run inferencecaptureLogits(),decodeAndCaptureOne(),_branchCaptureLogits,_branchDecodeAndCaptureOne_branchDecodeAndCaptureBatch→_branchPrefillto match intent across all layerssetSamplerParams()/setGrammar()hot-swap methods on BranchmodelSurprisal()/samplerSurprisal()for per-token entropy accessdecode_and_capture_batch→prefill,decode_and_capture_one→step)Breaking changes
ctx.decode(token)branch.commit(token)ctx.decodeBatch(tokens)branch.prefill(tokens)ctx.sample()/ctx.sampleWithParams()branch.sample()ctx.getLogits()branch.getLogits()ctx.acceptToken(token)branch.commit()branch.captureLogits()prefill()/commit()branch.decodeAndCaptureOne(token)branch.commit(token)Stats
20 files changed, ~1028 insertions, ~2064 deletions (net -1036 lines)
Test plan
LLOYAL_LOCAL=1 npm run build)npm run test:examples)