Skip to content

Releases: nazq/tei-manager

v0.12.0

04 Mar 03:04
65cd703

Choose a tag to compare

0.12.0 (2026-03-04)

Features

  • add CPU variant and replace hardcoded doc versions with placeholders (#49) (6d8eb2b)

v0.11.0

03 Mar 23:35
aad4b6f

Choose a tag to compare

0.11.0 (2026-03-03)

Features

  • update TEI to 1.9.2, bump dependencies, and update CI actions (#47) (9a10356)

v0.10.0

28 Nov 22:05
d114166

Choose a tag to compare

0.10.0 (2025-11-28)

Features

  • release: add Docker images section to release notes footer (93defa2)

Bug Fixes

  • ci: improve cargo caching with rust-cache and cargo-chef (#26) (701b1e9)
  • ci: release workflow waits for CI to pass (#24) (93defa2)
  • docker: use cargo-chef for proper dependency caching (701b1e9)
  • e2e: resolve flaky multiplexer test with warmup request (93defa2)

v0.9.0

28 Nov 21:12
1d47ee4

Choose a tag to compare

0.9.0 (2025-11-28)

Features

  • add release infrastructure, benchmarks, and quality improvements (#22) (ead36ff)

v0.8.0

27 Nov 21:19
1b28007

Choose a tag to compare

Added

  • Model Registry API - REST endpoints for managing HuggingFace models
    • GET /models - List all known models with download/verification status
    • POST /models - Register a model in the registry
    • GET /models/{id} - Get model details (cache path, size, metadata)
    • POST /models/{id}/download - Download model to HF cache
    • POST /models/{id}/load - Smoke test model loading on GPU
  • Model status tracking: availabledownloadingdownloadedloadingverified/failed
  • Auto-discovery of models already in HuggingFace cache on startup
  • Optional models config array to pre-register specific models
  • Native HuggingFace Hub integration via hf-hub crate (replaces CLI dependency)
  • HuggingFace embedding benchmark script (benchmarks/hf_embedding_bench.py)
  • Patch coverage comparison tools (just cov-main, just cov-patch)
  • Comprehensive model registry integration tests

Changed

  • Refactored to functional style with higher-order functions
    • Replace mutable loops with iterators and fold/filter_map patterns
    • Use futures::join_all for parallel async operations
    • Use scan/flat_map/unzip for sparse embedding building
  • E2E tests now use per-test containers for proper cleanup (fixes container leaks)
  • hf-hub now uses rustls-tls instead of native-tls for musl static builds

Happy Thanksgiving! 🦃

Docker Images

# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.8.0-tei-1.8.3

# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.8.0-tei-1.8.3-ada

# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.8.0-tei-1.8.3-hopper

v0.7.0

26 Nov 17:00

Choose a tag to compare

Added

  • EmbedSparseArrow gRPC endpoint for high-performance sparse embedding batch processing
    • Variable-length List<Struct<index:u32, value:f32>> output schema
    • LZ4 compression for Arrow IPC responses
    • Noop mode for round-trip testing
  • Deployment guide (docs/DEPLOYMENT.md) with Docker and Kubernetes examples
  • mTLS authentication guide (docs/MTLS.md)
  • GitHub issue templates for bug reports and feature requests

Changed

  • Updated benchmark README with v0.6.0 results and improved commands

Docker Images

# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.7.0-tei-1.8.3

# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.7.0-tei-1.8.3-ada

# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.7.0-tei-1.8.3-hopper

v0.6.0

26 Nov 02:36
94c5a5e

Choose a tag to compare

Changed

  • Reduced allocations in hot paths for ~5% improvement on Arrow batch operations
  • Arrow embed_arrow: build requests directly from Arrow array, pre-allocate flat embedding buffer
  • Log handler: only allocate strings for requested slice
  • Metrics: use static strings for metric names and label keys

Docker Images

# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.6.0-tei-1.8.3

# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.6.0-tei-1.8.3-ada

# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.6.0-tei-1.8.3-hopper

v0.5.0

26 Nov 01:42
982a46e

Choose a tag to compare

Changed

  • Updated Apache Arrow from 56 to 57 (IPC wire format remains compatible with v56 clients)
  • Updated criterion from 0.5 to 0.7
  • Benchmark code now uses std::hint::black_box instead of deprecated criterion::black_box

Docker Images

# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.5.0-tei-1.8.3

# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.5.0-tei-1.8.3-ada

# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.5.0-tei-1.8.3-hopper

v0.4.0

26 Nov 00:57
de4d269

Choose a tag to compare

Added

  • gRPC request timeouts (configurable via grpc_request_timeout_secs, default 30s)
  • Graceful shutdown for gRPC server
  • Connection pool pruning for idle/orphaned connections
  • Security scanning in CI (cargo-audit, cargo-deny)
  • Dependabot for automated dependency updates

Changed

  • Breaking: Error responses now include code field: {"error": "...", "code": "INSTANCE_NOT_FOUND", "timestamp": "..."}
  • Unified error handling with structured error types and codes
  • Some HTTP status codes refined (e.g., port allocation failures return 422)

Docker Images

# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.4.0-tei-1.8.3

# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.4.0-tei-1.8.3-ada

# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.4.0-tei-1.8.3-hopper

v0.3.0

25 Nov 23:24
a0759f7

Choose a tag to compare

Added

  • gRPC Multiplexer: Unified gRPC endpoint for routing embedding requests to multiple TEI instances
    • Full TEI gRPC API support (Embed, EmbedSparse, EmbedAll, Rerank, Tokenize, Decode)
    • Streaming RPC support for batch processing
    • Arrow IPC batch embedding via EmbedArrow endpoint with LZ4 compression
    • Connection pooling with lazy connection creation
    • Instance-based routing via target.instance_name
  • Benchmark Client (bench-client): Unified CLI tool for load testing
    • Standard mode: concurrent single-text requests
    • Arrow mode: batched Arrow IPC requests for high throughput
    • Configurable concurrency, batch size, and request counts
  • mTLS Authentication: Pluggable authentication framework
    • AuthProvider trait for custom authentication providers
    • MtlsProvider for mutual TLS certificate validation
    • Subject and SAN verification options
  • Instance Readiness Checks: gRPC-based health monitoring
    • Automatic status transition from Starting → Running
    • Configurable health check intervals and failure thresholds
    • Auto-restart on consecutive failures
  • Criterion Benchmarks: Performance testing suite
    • embedding_overhead: Direct vs multiplexer latency comparison
    • concurrent_requests: Parallel load scaling tests
    • streaming_requests: Batch streaming performance
    • arrow_batch: Arrow IPC vs streaming comparison
  • Development Tooling:
    • just bench-start/stop/status: Local benchmark environment management
    • just bench-open: Run benchmarks and open HTML report
  • GPU architecture-specific Docker variants (Ada Lovelace, Hopper)

Changed

  • Docker images now include bench-client binary
  • Health checks use gRPC Info RPC instead of HTTP
  • Improved error messages for instance lifecycle operations
  • Updated Docker build process with multi-variant support

Fixed

  • Docker build: Install protobuf-compiler in builder stage
  • Docker build: Copy benches directory for Cargo.toml parsing
  • Test isolation: Added #[serial] to environment variable tests
  • Connection pool management in high-concurrency scenarios

Docker Images

# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.3.0-tei-1.8.3

# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.3.0-tei-1.8.3-ada

# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.3.0-tei-1.8.3-hopper