β οΈ Early Beta: Proximum is under active development. APIs may change before 1.0 release. Feedback welcome!
π Help shape Proximum! We'd love your input. Please fill out our 2-minute feedback survey.
A high-performance, embeddable vector database for Clojure and Java with Git-like versioning and zero-cost branching.
Unlike traditional vector databases, Proximum brings persistent data structure semantics to vector search:
- β¨ Time Travel: Query any historical snapshot
- πΏ Zero-Cost Branching: Fork indices for experiments without copying data
- π Immutability: All operations return new versions, enabling safe concurrency
- πΎ True Persistence: Durable storage with structural sharing
- π High Performance: SIMD-accelerated search with competitive recall
- π¦ Pure JVM: No native dependencies, works everywhere
Perfect for RAG applications, semantic search, and ML experimentation where you need to track versions, A/B test embeddings, or maintain reproducible search results.
(require '[proximum.core :as prox])
;; Create identifier of the underlying storage with (random-uuid)
(def store-id #uuid "465df026-fcd3-4cb3-be44-29a929776250")
;; Create an index - feels like Clojure!
(def idx (prox/create-index {:type :hnsw
:dim 384
:store-config {:backend :memory
:id store-id}
:capacity 10000}))
;; Use collection protocols
(def idx2 (assoc idx "doc-1" (float-array (repeatedly 384 rand))))
(def idx3 (assoc idx2 "doc-2" (float-array (repeatedly 384 rand))))
;; Search for nearest neighbors
(def results (prox/search idx3 (float-array (repeatedly 384 rand)) 5))
; => ({:id "doc-1", :distance 0.234} {:id "doc-2", :distance 0.456} ...)
;; Git-like branching
(prox/sync! idx3)
(def experiment (prox/branch! idx3 "experiment"))π Full Clojure Guide
import org.replikativ.proximum.*;
// Create index with builder pattern
try (ProximumVectorStore store = ProximumVectorStore.builder()
.dimensions(384)
.storagePath("/tmp/vectors")
.build()) {
// Add vectors (immutable - returns new store)
store = store.add(embedding1, "doc-1");
store = store.add(embedding2, "doc-2");
// Search for nearest neighbors
List<SearchResult> results = store.search(queryVector, 5);
// => [SearchResult{id=doc-1, distance=0.234}, ...]
// Git-like versioning
store.sync(); // Create commit
UUID snapshot1 = store.getCommitId();
store = store.add(embedding3, "doc-3");
store.sync();
// Time travel: Query historical state
ProximumVectorStore historical = ProximumVectorStore.connectCommit(
Map.of("backend", ":file", "path", "/tmp/vectors"), snapshot1);
historical.search(queryVector, 5); // Only sees doc-1, doc-2!
// Branch for experiments
ProximumVectorStore experiment = store.branch("experiment");
}π Full Java Guide
{:deps {org.replikativ/proximum {:mvn/version "LATEST"}}}[org.replikativ/proximum "LATEST"]<dependency>
<groupId>org.replikativ</groupId>
<artifactId>proximum</artifactId>
<version>LATEST</version>
</dependency>implementation 'org.replikativ:proximum:LATEST'Every sync() creates a commit. Query any historical state:
index.sync(); // Snapshot 1
// ... make changes ...
index.sync(); // Snapshot 2
// Time travel to earlier state
ProximumVectorStore historical = index.asOf(commitId);Use Cases: Audit trails, debugging, A/B testing, reproducible results
Fork an index for experiments without copying data:
index.sync();
ProximumVectorStore experiment = index.branch("new-model");
// Test different embeddings
experiment.add(newEmbedding, "doc-1");
// Merge or discard - original unchangedUse Cases: A/B testing, staging, parallel experiments
- Filtered Search: Multi-tenant search with ID filtering
- Metadata: Attach arbitrary metadata to vectors
- Compaction: Reclaim space from deleted vectors
- Garbage Collection: Clean up unreachable commits
- Crypto-Hash: Tamper-proof audit trail with SHA-512
import org.replikativ.proximum.spring.ProximumVectorStore;
@Bean
public VectorStore vectorStore() {
return ProximumVectorStore.builder()
.dimensions(1536)
.storagePath("/data/vectors")
.build();
}π Spring AI Integration Guide | Spring Boot RAG Example
import org.replikativ.proximum.langchain4j.ProximumEmbeddingStore;
EmbeddingStore<TextSegment> store = ProximumEmbeddingStore.builder()
.dimensions(1536)
.storagePath("/data/embeddings")
.build();π LangChain4j Integration Guide
SIFT-1M (1M vectors, 128-dim, Intel Core Ultra 7):
| Implementation | Search QPS | Insert (vec/s) | p50 Latency | Recall@10 |
|---|---|---|---|---|
| hnswlib (C++) | 7,849 | 18,205 | 131 Β΅s | 98.32% |
| Proximum | 3,750 (48%) | 9,621 | 262 Β΅s | 98.66% |
| lucene-hnsw | 3,095 (39%) | 2,347 | 333 Β΅s | 98.53% |
| jvector | 1,844 (23%) | 6,095 | 557 Β΅s | 95.95% |
| hnswlib-java | 1,004 (13%) | 4,329 | 1,041 Β΅s | 98.30% |
Proximum metrics:
- Storage: 762.8 MB
- Heap usage: 545.7 MB
Key features:
- Pure JVM with SIMD acceleration (Java Vector API)
- No native dependencies, works on all platforms
- Persistent storage with zero-cost branching
API Guides:
- Clojure Guide - Complete Clojure API with collection protocols
- Java Guide - Builder pattern, immutability, and best practices
Integration Guides:
- Spring AI Guide - Spring Boot RAG applications
- LangChain4j Guide - LangChain4j embedding store integration
Advanced Topics:
- Cryptographic Auditability - Tamper-proof commit hashing and verification
- Persistence Design - Internal persistence mechanisms (PES, VectorStorage, PSS)
Examples:
- Spring Boot RAG Example - Full-featured RAG application with versioning
Browse working examples in examples/:
- Clojure: Semantic search, RAG, collection protocols
- Java: Quick start, auditable index, metadata usage
Demo Projects:
- Einbetten: Wikipedia semantic search with Datahike + FastEmbed (2,000 articles, ~8,000 chunks)
- Java: 22+ (Foreign Memory API finalized in Java 22)
- OS: Linux, macOS, Windows
- CPU: AVX2 recommended, AVX-512 for best performance
JVM Options Required:
--add-modules=jdk.incubator.vector
--enable-native-access=ALL-UNNAMEDEPL-2.0 (Eclipse Public License 2.0) - see LICENSE
We welcome contributions! See CONTRIBUTING.md for:
- Code of conduct
- Development workflow
- Testing requirements
- Licensing (DCO/EPL-2.0)
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Commercial Support: contact@datahike.io
Built with β€οΈ by the replikativ team