GitHub - hicder/muopdb: MuopDB

MuopDB - A vector database for AI memories

Introduction

MuopDB is a vector database for machine learning. Currently, it supports:

Hybrid search: Text search (with stemming support), vector search with filtering.
Index type: HNSW, IVF, SPANN, Multi-user SPANN. All on-disk.
Different I/O model: mmap, async I/O (with optional io_uring support on Linux).
Quantization: product quantization

Why MuopDB?

MuopDB supports multiple users by default. What that means is, each user will have its own vector index, within the same collection. The use-case for this is to build memory for LLMs. Think of it as:

Each user will have its own memory
Each user can still search a shared knowledge base.

All users' indices will be stored in a few files, reducing operational complexity.

Quick Start

Build MuopDB. Refer to this instruction.
Prepare necessary data and indices directories. On Mac, you might want to change these directories since root directory is read-only, i.e: ~/mnt/muopdb/.

mkdir -p /mnt/muopdb/indices
mkdir -p /mnt/muopdb/data

Start MuopDB index_server with the directories we just prepared using one of these methods:

# Start server locally. This is recommended for Mac.
cd target/release
RUST_LOG=info ./index_server --node-id 0 --index-config-path /mnt/muopdb/indices --index-data-path /mnt/muopdb/data --port 9002

# Start server with Docker. Only use this option on Linux.
docker-compose up --build

Now you have an up and running MuopDB index_server.
- You can send gRPC requests to this server (possibly with Postman).
- You can use Server Reflection in Postman - it will automatically detect the RPCs for MuopDB.

Examples using Postman

Create collection

{
    "collection_name": "test-collection-2",
    "num_features": 10,
    "wal_file_size": 1024000000,
    "max_time_to_flush_ms": 5000,
    "max_pending_ops": 10,
    "attribute_schema": {
        "attributes": [
            {
                "name": "title",
                "type": "ATTRIBUTE_TYPE_TEXT",
                "language": "english"
            },
            {
                "name": "content",
                "type": "ATTRIBUTE_TYPE_TEXT",
                "language": "english"
            }
        ]
    }
}

Insert some data

{
    "collection_name": "test-collection-2",
    "doc_ids": [
        {
            "uuid": "00000000-0000-0000-0000-000000000064"
        }
    ],
    "user_ids": [
        {
            "uuid": "00000000-0000-0000-0000-000000000000"
        }
    ],
    "vectors": [
        100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0
    ],
    "attributes": {
        "values": [
            {
                "value": {
                    "title": {
                        "text_value": "Example Document"
                    },
                    "content": {
                        "text_value": "This is an example document for search demonstration"
                    }
                }
            }
        ]
    }
}

Search

{
    "collection_name": "test-collection-2",
    "params": {
        "ef_construction": 200,
        "record_metrics": false,
        "top_k": 1
    },
    "user_ids": [
        {
            "uuid": "00000000-0000-0000-0000-000000000000"
        }
    ],
    "vector": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]
}

Remove

{
    "collection_name": "test-collection-2",
    "doc_ids": [
        {
            "uuid": "00000000-0000-0000-0000-000000000064"
        }
    ],
    "user_ids": [
        {
            "uuid": "00000000-0000-0000-0000-000000000000"
        }
    ]
}

Search again You should see something else

{
    "collection_name": "test-collection-2",
    "params": {
        "ef_construction": 200,
        "record_metrics": false,
        "top_k": 1
    },
    "user_ids": [
        {
            "uuid": "00000000-0000-0000-0000-000000000000"
        }
    ],
    "vector": [100.0, 101.0, 102.0, 103.0, 104.0, 105.0, 106.0, 107.0, 108.0, 109.0]
}

This time it should give you something else

TermSearch only

{
    "collection_name": "test-collection-2",
    "user_ids": [
        {
            "uuid": "00000000-0000-0000-0000-000000000000"
        }
    ],
    "limit": 10,
    "filter": {
        "contains": {
            "path": "content",
            "value": "search"
        }
    }
}

This performs a text-only search without requiring a vector, returning documents where the content field contains the term "search". You can also search the title field or combine multiple filters using and/or operators.

Plans

Phase 0 (Done)

Query path
- Vector similarity search
- Hierarchical Navigable Small Worlds (HNSW)
- Product Quantization (PQ)
Indexing path
- Support periodic offline indexing
Database Management
- Doc-sharding & query fan-out with aggregator-leaf architecture
- In-memory & disk-based storage with mmap

Phase 1 (Done)

Query & Indexing
- Inverted File (IVF)
- Improve locality for HNSW
- SPANN

Phase 2 (Done)

Phase 3 (Done)

Phase 4 (Done)

Features
- Hybrid search
- Term search only (without vector)
Database Management
- Optimizing deletion with bloom filter
- Optimizing WAL write with thread-safe write group
- Automatic segment optimizer
- Non-mmap implementation of SPANN and Term index (with io_uring support)

Phase 5 (Ongoing)

Features
- Search relevance score (BM25, TF/IDF)
Database management / Optimization
- MuopDB with consensus protocol (Raft)
- Cloud MuopDB (native on object store)
- Improve skip_to performance on Elias-Fano encoding

Building

Install prerequisites:
- Rust: https://www.rust-lang.org/tools/install
- Make sure you're on nightly: rustup toolchain install nightly
- Libraries

# MacOS (using Homebrew)
brew install protobuf openblas

# Linux (Arch-based)
# On Arch Linux (and its derivatives, such as EndeavourOS, CachyOS):
sudo pacman -Syu protobuf openblas

# Linux (Debian-based)
sudo apt-get install libprotobuf-dev libopenblas-dev

Build from Source:

git clone https://github.com/hicder/muopdb.git
cd muopdb

# Build
cargo build --release

# Run tests
cargo test --release

Contributions

Main contributors:

This project is done with TechCare Coaching. I am mentoring mentees who made contributions to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 634 Commits
.cargo		.cargo
.github/workflows		.github/workflows
py		py
rs		rs
.dockerignore		.dockerignore
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
GEMINI.md		GEMINI.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml
extra_launch.json		extra_launch.json
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
rust-toolchain.toml		rust-toolchain.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MuopDB - A vector database for AI memories

Introduction

Why MuopDB?

Quick Start

Examples using Postman

Plans

Phase 0 (Done)

Phase 1 (Done)

Phase 2 (Done)

Phase 3 (Done)

Phase 4 (Done)

Phase 5 (Ongoing)

Building

Contributions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 9

Uh oh!

Languages

hicder/muopdb

Folders and files

Latest commit

History

Repository files navigation

MuopDB - A vector database for AI memories

Introduction

Why MuopDB?

Quick Start

Examples using Postman

Plans

Phase 0 (Done)

Phase 1 (Done)

Phase 2 (Done)

Phase 3 (Done)

Phase 4 (Done)

Phase 5 (Ongoing)

Building

Contributions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 9

Uh oh!

Languages

Packages