Pure-Rust text embedding inference for local-first applications.
HypEmbed is a Rust library for generating BERT-compatible text embeddings without Python, ONNX Runtime, libtorch, or hosted inference services. Load local model weights, tokenize input, run the encoder, and get normalized vectors from a small API surface.
- Pure Rust from tokenizer to encoder forward pass
- Local-first inference with no external ML runtime dependency
- BERT-family support for common embedding models such as MiniLM
- Correctness-focused math with stable softmax, layer norm, and normalization
- Performance-aware implementation with SIMD primitives, memory-mapped weights, and batch tokenization
- Supports BERT-style encoder models, including BERT, MiniLM, and DistilBERT-style layouts
- Loads
config.json,vocab.txt, andmodel.safetensorsfrom a local model directory - Offers mean pooling and CLS pooling
- Accepts F32, F16, and BF16 weights, converting to
f32for inference - Runs on CPU only
HypEmbed does not currently handle training, quantization, GPU execution, or direct Hugging Face Hub downloads.
cargo add hypembeduse hypembed::{Embedder, EmbeddingOptions, PoolingStrategy};
let model = Embedder::load("./model").unwrap();
let options = EmbeddingOptions::default()
.with_normalize(true)
.with_pooling(PoolingStrategy::Mean);
let embeddings = model
.embed(&["hello world", "rust embeddings"], &options)
.unwrap();
println!("Embedding dim: {}", embeddings[0].len());
println!("First 5 values: {:?}", &embeddings[0][..5]);To try a complete example locally:
cargo run --example basic_embed -- ./path/to/modelHypEmbed expects a local directory with:
| File | Description |
|---|---|
config.json |
Hugging Face style model configuration |
vocab.txt |
BERT WordPiece vocabulary |
model.safetensors |
SafeTensors weights |
Example compatible model:
sentence-transformers/all-MiniLM-L6-v2
- Project site: https://neuralforgeone.github.io/hypembed/
- API docs: https://neuralforgeone.github.io/hypembed/api/hypembed/
- Architecture notes: ARCHITECTURE.md
- Product spec: PRODUCT_SPEC.md
- Roadmap: ROADMAP.md
HypEmbed follows a simple pipeline:
input text
-> pre-tokenize and normalize
-> WordPiece tokenize
-> add special tokens, truncate, and pad
-> embedding layer
-> encoder stack
-> mean or CLS pooling
-> optional L2 normalization
-> embedding vector
The project favors explicit behavior and stable numerics:
- softmax subtracts the row maximum before exponentiation
- layer norm uses epsilon guards
- pooling and vector normalization avoid divide-by-zero edge cases
- typed errors keep load and inference failures inspectable
HypEmbed is early-stage but already includes:
- cross-platform CI
- benchmark compilation checks
- generated API documentation
- architecture and roadmap notes in-repo
Licensed under either of:
- Apache License, Version 2.0, see LICENSE-APACHE
- MIT license, see LICENSE-MIT