16 Apr 00:00

janbalangue

f318442

v3.0.0 Latest

Latest

[3.0.0] —

Breaking Changes

LLMStats shape changed. bulkhead.stats() no longer returns base Stats fields at the top level.
Base bulkhead stats now live under stats().bulkhead, and LLM-layer counters now live under
stats().llm.
Code that previously accessed:
- stats().inFlight
- stats().pending
- stats().maxConcurrent
- stats().maxQueue
- stats().closed
must now read:
- stats().bulkhead.inFlight
- stats().bulkhead.pending
- stats().bulkhead.maxConcurrent
- stats().bulkhead.maxQueue
- stats().bulkhead.closed

Added

stats().llm block with LLM-layer request counters:
- admitted
- released
- rejected
- rejectedByReason

Changed

The run() callback signal type now derives from AcquireOptions["signal"]
instead of referring to the global AbortSignal type directly.
Test utilities now avoid direct dependency on ambient AbortController globals.
Bumped async-bulkhead-ts to ^0.4.1. :contentReference[oaicite:1]{index=1}

Migration Guide

From v2 → v3, update stats access only.

Before:

const s = bulkhead.stats();
s.inFlight;
s.pending;

After:

const s = bulkhead.stats();
s.bulkhead.inFlight;
s.bulkhead.pending;

LLM-layer counters are now separate:

const s = bulkhead.stats();
s.llm.admitted;
s.llm.rejected;
s.llm.rejectedByReason.budget_limit;

Notes

No change to admission semantics, token budget semantics, deduplication behavior,
or graceful shutdown behavior.
This release separates underlying bulkhead telemetry from LLM-layer request telemetry.

Assets 2

03 Mar 21:44

janbalangue

v2.0.0

41f2aa0

v2.0.0

async-bulkhead-llm v2.0.0

Fail-fast admission control for LLM workloads — now with token refunds, multimodal support, and model-aware routing.

This release significantly improves budget utilization and flexibility while preserving the simple v1 API.

🚀 Highlights

💸 Token Refund (Major Improvement)

v1 reserved input + max_tokens and held the full reservation until release.

v2 introduces post-completion refunds:

Report actual usage via getUsage() (or token.release(usage))
Unused output tokens are immediately returned to the budget
Improves throughput under tight token ceilings
No breaking changes — behavior matches v1 if usage isn’t provided

await bulkhead.run(
  request,
  async () => callLLM(request),
  {
    getUsage: (res) => ({
      input:  res.usage.input_tokens,
      output: res.usage.output_tokens,
    }),
  },
);

🖼 Multimodal Content Support

content may now be:

string
ContentBlock[]

Built-in estimators:

Count text blocks
Ignore non-text blocks
Provide lower-bound estimates for multimodal inputs

Custom estimators remain fully supported.

🧠 Per-Request Model Awareness

You can now route different models through a single bulkhead:

await bulkhead.run(
  { model: 'claude-haiku-4-5', messages, max_tokens: 512 },
  async () => callLLM(request),
);

Estimator behavior:

Uses request.model when present
Falls back to the bulkhead default model

🔁 In-Flight Deduplication Improvements

Default dedup key now includes:
- messages
- max_tokens
- model
Prevents cross-model conflation
Custom keyFn supported
Return "" to opt a request out

📊 New Stats Fields

stats() now includes:

tokenBudget.totalRefunded
deduplication.active
deduplication.hits

Improved visibility into load shedding and savings.

🧩 Profiles

Two built-in presets:

interactive (default): fail-fast, no waiting
batch: bounded queue + timeout

Custom profile objects supported.

⚠️ Breaking Changes

None for typical usage.

If you relied on:

Exact token reservation behavior (no refunds)
Previous dedup key semantics

Review the updated behavior, but most users require no changes.

🛠 Migration from v1

Most callers need zero changes.

To benefit from refunds:

Provide getUsage() in run(), or
Pass usage to token.release()

See the README for full details.

🧱 What This Library Is (and Isn’t)

This library enforces:

Concurrency ceilings
Token budgets
Fail-fast load shedding
Backpressure at LLM boundaries

It does not:

Retry
Wrap provider SDKs
Perform distributed rate limiting
Perform cost accounting

📦 Compatibility

Node.js 20+
ESM + CJS builds
Zero runtime dependencies beyond async-bulkhead-ts

❤️ Why v2 Matters

Token refunds dramatically improve effective capacity under real-world workloads where:

max_tokens is over-provisioned
Outputs are shorter than caps
Budgets are tight
Multiple models share a boundary

v2 allows you to keep strict ceilings without sacrificing utilization.

Assets 2

26 Feb 23:53

janbalangue

v1.0.3

d61124c

v1.0.3

async-bulkhead-llm v1.0.3

Overview

Metadata-only maintenance release.
This version aligns the package.json license field with the repository’s Apache 2.0 license.

Changed

Updated license field in package.json to Apache-2.0
No runtime changes
No API changes
No type changes

Compatibility

Fully compatible with:

1.0.0
1.0.1
1.0.2

Safe upgrade. No migration required.

Assets 2

26 Feb 23:29

janbalangue

v1.0.2

6fe4c3b

v1.0.2

async-bulkhead-llm v1.0.2

Overview

Metadata-only maintenance release.

This version corrects the GitHub URLs in package.json to ensure the homepage, repository, and bugs fields point to the canonical repository.

Changed

Fixed GitHub URLs in package metadata
No runtime changes
No API changes
No type changes

Compatibility

Fully compatible with:

1.0.0
1.0.1

Safe upgrade. No migration required.

Assets 2

26 Feb 23:09

janbalangue

v1.0.1

0f808aa

v1.0.1

async-bulkhead-llm v1.0.1

Overview

Maintenance release.

This version updates the underlying concurrency primitive dependency and includes packaging/CI hardening to ensure published artifacts always contain the correct ESM, CJS, and type outputs.

No API changes. No behavior changes. No migration required.

Changed

Bumped async-bulkhead-ts to ^0.3.0
Hardened packaging workflow:
- Ensures dist/ is built before pack/publish
- Added deterministic tarball verification in CI

Stability

No changes to:
- Admission semantics
- Token budget logic
- Deduplication behavior
- Rejection reasons
- Public types
- Runtime stats surface

Fully compatible with 1.0.0.

Upgrade

npm install async-bulkhead-llm@1.0.1

No code changes required.

Assets 2

25 Feb 02:54

janbalangue

v1.0.0

5bce821

v1.0.0

async-bulkhead-llm v1.0.0

Initial stable release.

async-bulkhead-llm provides fail-fast admission control for LLM workloads, built on async-bulkhead-ts. It is designed for services that need to enforce cost ceilings, concurrency limits, and backpressure at the boundary of their LLM calls.

🚀 Highlights

Hard Concurrency Limits

Strict maxConcurrent enforcement
Optional bounded queue via maxQueue
Fail-fast by default (maxQueue: 0)

Token-Aware Admission

Enforce a ceiling on total in-flight tokens
Reservations are calculated from input + maxOutput
Admission fails fast when the token budget is exceeded
Independent of concurrency headroom or queue configuration

Model-Aware Estimation

Built-in per-model character-to-token ratios
Longest-prefix matching for known model families
Exact override support
Fallback to flat 4.0 ratio for unknown models
Optional onUnknownModel hook

In-Flight Deduplication

Identical message payloads share a single LLM call
Reduces duplicate work under burst conditions
Dedup stats available via bulkhead.stats()

Clean API Surface

bulkhead.run(request, fn) — primary API (auto acquire + release)
bulkhead.acquire(request) — manual lifecycle control
LLMBulkheadRejectedError with structured reason
Runtime stats() with optional tokenBudget and deduplication blocks

Profiles

'interactive' — fail-fast, no queue
'batch' — bounded queue, 30s timeout
Escape hatch via plain preset object

📦 Runtime Characteristics

Node.js 20+
ESM + CommonJS builds
Full TypeScript typings
Zero dependencies beyond async-bulkhead-ts
No retries
No provider SDK coupling
No distributed coordination

⚠️ Design Constraints (Intentional)

Token estimation is approximate — suitable for load-shedding, not billing
Deduplication key in v1 is JSON.stringify(messages)
Multimodal (non-string content) is not supported by built-in estimators
Refund mechanism (adjusting reservations based on actual usage) is planned for v2

🎯 Intended Use

This library is for enforcing backpressure at the service boundary of LLM calls:

Prevent burst cost explosions
Enforce cost ceilings
Avoid cascading saturation
Shed excess load early

It does not replace:

Retry libraries
Cost accounting systems
Distributed rate limiting
Provider SDKs

🔒 Security

See SECURITY.md for the vulnerability disclosure process and defined threat surface.

🧪 Stability

This is the first stable release. The API surface is intentionally small and opinionated. Breaking changes will follow semantic versioning.

Assets 2

Releases: janbalangue/async-bulkhead-llm

v3.0.0

[3.0.0] —

Breaking Changes

Added

Changed

Migration Guide

Notes

Uh oh!

v2.0.0

async-bulkhead-llm v2.0.0

🚀 Highlights

💸 Token Refund (Major Improvement)

🖼 Multimodal Content Support

🧠 Per-Request Model Awareness

🔁 In-Flight Deduplication Improvements

📊 New Stats Fields

⚠️ Breaking Changes

🛠 Migration from v1

🧱 What This Library Is (and Isn’t)

Uh oh!

v1.0.3

async-bulkhead-llm v1.0.3

Overview

Changed

Compatibility

Uh oh!

v1.0.2

async-bulkhead-llm v1.0.2

Overview

Changed

Compatibility

Uh oh!

v1.0.1

async-bulkhead-llm v1.0.1

Overview

Changed

Stability

Uh oh!

v1.0.0

async-bulkhead-llm v1.0.0

🚀 Highlights

Hard Concurrency Limits

Token-Aware Admission

Model-Aware Estimation

In-Flight Deduplication

Clean API Surface

Profiles

📦 Runtime Characteristics

⚠️ Design Constraints (Intentional)

🎯 Intended Use

🔒 Security

🧪 Stability

Uh oh!