OpenSonarX

Free, open-source search engine built for AI agents. Returns full content with trust signals — so LLMs don't have to fetch and parse web pages themselves.

The Problem

Today, when an AI agent needs to answer a question from the web, it:

Calls a search API → gets 10 links with thin snippets (~250 tokens)
Fetches 3 pages → parses HTML, strips boilerplate (~6,500 tokens)
Sends all of it to the LLM → expensive, slow, full of noise

This costs ~$20,000/month per 1M queries in input tokens alone. Most of those tokens are navigation menus, cookie banners, and ads.

The Solution

OpenSonarX pre-crawls the web, extracts clean content, and returns only the sentences relevant to your query — ready for RAG synthesis.

Approach	Median Tokens	Cost/1M Queries (Sonnet)
Search API + fetch 3 pages	6,448	$19,978
OpenSonarX	835	$2,480

8x fewer tokens. 8x lower cost. Same answer quality.

Benchmarked across 50 queries (tutorials, concepts, devops, AI/ML, product docs). Full results →

Install

curl -fsSL https://opensonarx.com/install.sh | sh

Or build from source:

git clone https://github.com/bbiangul/opensonarx.git
cd opensonarx
cargo build --release --bin oi

Quick Start

# Search (works immediately — DuckDuckGo blended results, no API key needed)
oi search "how to deploy a Next.js app to Vercel"

# Seed your local index with content
oi seed sitemap https://docs.rs/sitemap.xml

# Run a P2P node with HTTP API
oi network node --port 4001 --http-port 8001

# Search through your node
oi search "rust async" --node http://localhost:8001

How It Works

Agent query → OpenSonarX
                ├── HNSW vector search (384-dim MiniLM embeddings)
                ├── BM25 keyword search (TF-IDF)
                ├── Reputation-weighted ranking
                └── Returns: clean Markdown + trust signals

Sentinel nodes crawl the web → extract content → chunk → embed → index
AI agents search → get ranked results with full content, not just links
Feedback signals from agents improve ranking over time

Architecture

crates/
├── oi-types     # Shared types, error codes, protobuf
├── oi-index     # HNSW + BM25 hybrid search engine
├── oi-embed     # MiniLM (ONNX, 384-dim) embeddings
├── oi-crawl     # Content distiller + crawl pipeline
├── oi-network   # libp2p P2P layer (Kademlia, gossipsub)
├── oi-sdk       # SDK core (wires index + embedder + engines)
├── oi-staking   # Solana Anchor program ($TRUTH token) [Phase 3]
├── oi-cli       # CLI binary (oi)
└── oi-facts     # Entity extraction (GLiNER ONNX)

Why OpenSonarX?

Feature	Traditional Search APIs	OpenSonarX
Output	Links + snippets	Full Markdown content
Token cost	~6,500 tokens/query	~835 tokens/query
Fetching pages	Agent must fetch & parse	Pre-crawled and indexed
Trust signals	None	Reputation score, feedback signals
Content quality	Raw HTML with ads/nav	Clean, query-relevant content
Cost	Paid API + LLM tokens	Free and open source
Infrastructure	Centralized	Decentralized P2P network

Roadmap

Phase	Status	Description
Phase 1	Done	Hybrid search engine (HNSW + BM25), content distiller, CLI
Phase 2	Done	P2P network (libp2p), blended search, sentinel crawler
Phase 3	Planned	$TRUTH token staking on Solana, dispute/jury system, stake-weighted ranking

Documentation

Full docs at docs.opensonarx.com

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github/workflows		.github/workflows
bench		bench
crates		crates
docs		docs
examples		examples
programs/opensonarx-staking		programs/opensonarx-staking
proto		proto
scripts		scripts
tests/integration		tests/integration
.dockerignore		.dockerignore
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
COMPARISON.md		COMPARISON.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
ECONOMICS.md		ECONOMICS.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
SEED.md		SEED.md
benchmark_raw_data.json		benchmark_raw_data.json
docker-compose.yml		docker-compose.yml
whitepaper.md		whitepaper.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenSonarX

The Problem

The Solution

Install

Quick Start

How It Works

Architecture

Why OpenSonarX?

Roadmap

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenSonarX

The Problem

The Solution

Install

Quick Start

How It Works

Architecture

Why OpenSonarX?

Roadmap

Documentation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages