Skip to content

neuralforgeone/hypembed

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HypEmbed

Pure-Rust text embedding inference for local-first applications.

CI crates.io License: MIT OR Apache-2.0 Docs

HypEmbed is a Rust library for generating BERT-compatible text embeddings without Python, ONNX Runtime, libtorch, or hosted inference services. Load local model weights, tokenize input, run the encoder, and get normalized vectors from a small API surface.

Why HypEmbed

  • Pure Rust from tokenizer to encoder forward pass
  • Local-first inference with no external ML runtime dependency
  • BERT-family support for common embedding models such as MiniLM
  • Correctness-focused math with stable softmax, layer norm, and normalization
  • Performance-aware implementation with SIMD primitives, memory-mapped weights, and batch tokenization

Current Scope

  • Supports BERT-style encoder models, including BERT, MiniLM, and DistilBERT-style layouts
  • Loads config.json, vocab.txt, and model.safetensors from a local model directory
  • Offers mean pooling and CLS pooling
  • Accepts F32, F16, and BF16 weights, converting to f32 for inference
  • Runs on CPU only

HypEmbed does not currently handle training, quantization, GPU execution, or direct Hugging Face Hub downloads.

Installation

cargo add hypembed

Quick Start

use hypembed::{Embedder, EmbeddingOptions, PoolingStrategy};

let model = Embedder::load("./model").unwrap();

let options = EmbeddingOptions::default()
    .with_normalize(true)
    .with_pooling(PoolingStrategy::Mean);

let embeddings = model
    .embed(&["hello world", "rust embeddings"], &options)
    .unwrap();

println!("Embedding dim: {}", embeddings[0].len());
println!("First 5 values: {:?}", &embeddings[0][..5]);

To try a complete example locally:

cargo run --example basic_embed -- ./path/to/model

Model Directory

HypEmbed expects a local directory with:

File Description
config.json Hugging Face style model configuration
vocab.txt BERT WordPiece vocabulary
model.safetensors SafeTensors weights

Example compatible model:

  • sentence-transformers/all-MiniLM-L6-v2

Documentation

Design Notes

HypEmbed follows a simple pipeline:

input text
  -> pre-tokenize and normalize
  -> WordPiece tokenize
  -> add special tokens, truncate, and pad
  -> embedding layer
  -> encoder stack
  -> mean or CLS pooling
  -> optional L2 normalization
  -> embedding vector

The project favors explicit behavior and stable numerics:

  • softmax subtracts the row maximum before exponentiation
  • layer norm uses epsilon guards
  • pooling and vector normalization avoid divide-by-zero edge cases
  • typed errors keep load and inference failures inspectable

Open Source Status

HypEmbed is early-stage but already includes:

  • cross-platform CI
  • benchmark compilation checks
  • generated API documentation
  • architecture and roadmap notes in-repo

License

Licensed under either of:

About

Pure-Rust BERT-compatible text embedding inference for local-first applications.

Topics

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors