Releases: nazq/tei-manager
Releases · nazq/tei-manager
v0.12.0
v0.11.0
v0.10.0
v0.9.0
v0.8.0
Added
- Model Registry API - REST endpoints for managing HuggingFace models
GET /models- List all known models with download/verification statusPOST /models- Register a model in the registryGET /models/{id}- Get model details (cache path, size, metadata)POST /models/{id}/download- Download model to HF cachePOST /models/{id}/load- Smoke test model loading on GPU
- Model status tracking:
available→downloading→downloaded→loading→verified/failed - Auto-discovery of models already in HuggingFace cache on startup
- Optional
modelsconfig array to pre-register specific models - Native HuggingFace Hub integration via
hf-hubcrate (replaces CLI dependency) - HuggingFace embedding benchmark script (
benchmarks/hf_embedding_bench.py) - Patch coverage comparison tools (
just cov-main,just cov-patch) - Comprehensive model registry integration tests
Changed
- Refactored to functional style with higher-order functions
- Replace mutable loops with iterators and fold/filter_map patterns
- Use
futures::join_allfor parallel async operations - Use scan/flat_map/unzip for sparse embedding building
- E2E tests now use per-test containers for proper cleanup (fixes container leaks)
hf-hubnow usesrustls-tlsinstead ofnative-tlsfor musl static builds
Happy Thanksgiving! 🦃
Docker Images
# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.8.0-tei-1.8.3
# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.8.0-tei-1.8.3-ada
# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.8.0-tei-1.8.3-hopperv0.7.0
Added
EmbedSparseArrowgRPC endpoint for high-performance sparse embedding batch processing- Variable-length
List<Struct<index:u32, value:f32>>output schema - LZ4 compression for Arrow IPC responses
- Noop mode for round-trip testing
- Variable-length
- Deployment guide (
docs/DEPLOYMENT.md) with Docker and Kubernetes examples - mTLS authentication guide (
docs/MTLS.md) - GitHub issue templates for bug reports and feature requests
Changed
- Updated benchmark README with v0.6.0 results and improved commands
Docker Images
# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.7.0-tei-1.8.3
# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.7.0-tei-1.8.3-ada
# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.7.0-tei-1.8.3-hopperv0.6.0
Changed
- Reduced allocations in hot paths for ~5% improvement on Arrow batch operations
- Arrow embed_arrow: build requests directly from Arrow array, pre-allocate flat embedding buffer
- Log handler: only allocate strings for requested slice
- Metrics: use static strings for metric names and label keys
Docker Images
# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.6.0-tei-1.8.3
# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.6.0-tei-1.8.3-ada
# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.6.0-tei-1.8.3-hopperv0.5.0
Changed
- Updated Apache Arrow from 56 to 57 (IPC wire format remains compatible with v56 clients)
- Updated criterion from 0.5 to 0.7
- Benchmark code now uses
std::hint::black_boxinstead of deprecatedcriterion::black_box
Docker Images
# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.5.0-tei-1.8.3
# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.5.0-tei-1.8.3-ada
# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.5.0-tei-1.8.3-hopperv0.4.0
Added
- gRPC request timeouts (configurable via
grpc_request_timeout_secs, default 30s) - Graceful shutdown for gRPC server
- Connection pool pruning for idle/orphaned connections
- Security scanning in CI (cargo-audit, cargo-deny)
- Dependabot for automated dependency updates
Changed
- Breaking: Error responses now include
codefield:{"error": "...", "code": "INSTANCE_NOT_FOUND", "timestamp": "..."} - Unified error handling with structured error types and codes
- Some HTTP status codes refined (e.g., port allocation failures return 422)
Docker Images
# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.4.0-tei-1.8.3
# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.4.0-tei-1.8.3-ada
# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.4.0-tei-1.8.3-hopperv0.3.0
Added
- gRPC Multiplexer: Unified gRPC endpoint for routing embedding requests to multiple TEI instances
- Full TEI gRPC API support (Embed, EmbedSparse, EmbedAll, Rerank, Tokenize, Decode)
- Streaming RPC support for batch processing
- Arrow IPC batch embedding via
EmbedArrowendpoint with LZ4 compression - Connection pooling with lazy connection creation
- Instance-based routing via
target.instance_name
- Benchmark Client (
bench-client): Unified CLI tool for load testing- Standard mode: concurrent single-text requests
- Arrow mode: batched Arrow IPC requests for high throughput
- Configurable concurrency, batch size, and request counts
- mTLS Authentication: Pluggable authentication framework
AuthProvidertrait for custom authentication providersMtlsProviderfor mutual TLS certificate validation- Subject and SAN verification options
- Instance Readiness Checks: gRPC-based health monitoring
- Automatic status transition from Starting → Running
- Configurable health check intervals and failure thresholds
- Auto-restart on consecutive failures
- Criterion Benchmarks: Performance testing suite
embedding_overhead: Direct vs multiplexer latency comparisonconcurrent_requests: Parallel load scaling testsstreaming_requests: Batch streaming performancearrow_batch: Arrow IPC vs streaming comparison
- Development Tooling:
just bench-start/stop/status: Local benchmark environment managementjust bench-open: Run benchmarks and open HTML report
- GPU architecture-specific Docker variants (Ada Lovelace, Hopper)
Changed
- Docker images now include
bench-clientbinary - Health checks use gRPC Info RPC instead of HTTP
- Improved error messages for instance lifecycle operations
- Updated Docker build process with multi-variant support
Fixed
- Docker build: Install protobuf-compiler in builder stage
- Docker build: Copy benches directory for Cargo.toml parsing
- Test isolation: Added
#[serial]to environment variable tests - Connection pool management in high-concurrency scenarios
Docker Images
# Multi-arch (default)
docker pull ghcr.io/nazq/tei-manager:0.3.0-tei-1.8.3
# Ada Lovelace (RTX 4090/4080)
docker pull ghcr.io/nazq/tei-manager:0.3.0-tei-1.8.3-ada
# Hopper (H100/H200)
docker pull ghcr.io/nazq/tei-manager:0.3.0-tei-1.8.3-hopper