The AI-native database built for LLM applications, delivering lightning-fast hybrid search across dense embedding, sparse embedding, tensor (multi-vector), and full-text.
Documentation | Benchmark | Twitter | Discord
Infinity is a cutting-edge AI-native database purpose-built for modern LLM applications. It supports robust hybrid search across diverse data types such as dense vectors, sparse vectors, tensors, full-text, and structured data. This makes it the perfect solution for applications like:
- Search and Recommendations
- Question-Answering Systems
- Conversational AI
- Copilots
- Content Generation
- Retrieval-Augmented Generation (RAG)
- Sub-millisecond query latency: 0.1ms latency on million-scale vector datasets.
- Handles 15K+ QPS on vector queries and 12K+ QPS for full-text search on 33M documents.
See the Benchmark Report for detailed performance insights.
- Seamlessly combines dense embedding, sparse embedding, tensor (multi-vector), and full-text search.
- Advanced re-ranking options, including Reciprocal Rank Fusion (RRF), weighted sum, and ColBERT-style ranking.
- Supports diverse data types, including strings, numerics, vectors, tensors, and more.
- Built to handle structured and semi-structured data efficiently.
- Python SDK for intuitive integration.
- A lightweight, single-binary architecture ensures easy deployment.
- Compatible with both embedded mode and client-server mode.
Infinity offers two modes of operation: embedded mode (for direct integration into Python applications) and client-server mode (for separate backend processes).
Install the embedded SDK:
pip install infinity-embedded-sdk==0.6.0.dev2Use Infinity for dense vector search:
import infinity_embedded
# Connect to Infinity
infinity_object = infinity_embedded.connect("/absolute/path/to/save/to")
# Retrieve a database object
db_object = infinity_object.get_database("default_db")
# Create a table with multiple column types
table_object = db_object.create_table("my_table", {
"num": {"type": "integer"},
"body": {"type": "varchar"},
"vec": {"type": "vector, 4, float"}
})
# Insert data into the table
table_object.insert([
{"num": 1, "body": "unnecessary and harmful", "vec": [1.0, 1.2, 0.8, 0.9]},
{"num": 2, "body": "Office for Harmful Blooms", "vec": [4.0, 4.2, 4.3, 4.5]}
])
# Perform a dense vector search
res = table_object.output(["*"])\
.match_dense("vec", [3.0, 2.8, 2.7, 3.1], "float", "ip", 2)\
.to_pl()
print(res)π‘ Learn more in the Python API Reference.
For larger-scale deployments, you can set up Infinity in client-server mode. See the Deploy Infinity Server guide for details.
Curious about whatβs next for Infinity? Check out the Roadmap 2025 to learn more about upcoming features and improvements.
Join the conversation and connect with us:
Note
Setting up and hosting the AutoGPT Platform yourself is a technical process. If you'd rather something that just works, we recommend joining the waitlist for the cloud-hosted beta.
video.mp4
This tutorial assumes you have Docker, VSCode, git and npm installed.


