Version 1.1 — February 2026
Authors: OpenSonarX Core Team
"Stake truth, burn spam, earn trust."
The proliferation of large language models (LLMs) and autonomous AI agents has created urgent demand for a search infrastructure layer that is accurate, spam-resistant, and economically aligned with content quality rather than advertising revenue. Existing search engines optimize for human click-through rates and ad placement; they are ill-suited to serve machine consumers that require factual, verifiable, and up-to-date information at API speed.
OpenSonarX is a decentralized search protocol in which publishers stake $TRUTH tokens to vouch for the quality of their content. Staked content is indexed in a hybrid vector-and-keyword search engine, distributed across a peer-to-peer network, and ranked by a multi-signal formula that rewards relevance, economic commitment, and community reputation. A system of sentinels, disputes, and commit-reveal juries enforces quality standards on-chain, slashing dishonest actors and burning tokens to maintain long-term deflation. Governance is fully on-chain, with time-locked proposals and anti-flash-loan protections.
This paper presents the protocol architecture, the economic model, the governance framework, the search algorithm, and the formal security properties of OpenSonarX.
- Introduction & Motivation
- Protocol Overview
- System Architecture
- Hybrid Search Engine — HNSW, BM25, Fusion, Embeddings, Reranking, SimHash, Diversity
- Crawl & Content Pipeline — Distillation, Quality Gate, Discovery, Frontier, Adapters, Spider, Daemon, Drift, Facts
- Peer-to-Peer Network — Transport, Gossip, Query Protocol, Fanout, Replication
- Staking & Economic Model
- Dispute Resolution & Slashing
- State Channels & Micropayments
- Governance (DAO)
- Token Supply & Emission Schedule
- Ranking Formula — Formal Specification
- Security Analysis
- Roadmap
- Conclusion
- Appendix A — Protocol Parameters
- Appendix B — Error Code Taxonomy
- Appendix C — Wire Protocol (Protobuf)
Modern web search was designed for humans browsing the web. Revenue flows from advertisers, not from the quality of information returned. This misalignment produces three systemic failures:
-
Ad-driven ranking distortion. Search engines optimize for engagement and ad revenue, not factual accuracy. Results that generate clicks are promoted over results that provide correct answers.
-
AI slop and SEO spam. The cost of producing low-quality, machine-generated content has collapsed. Search indexes are increasingly polluted with formulaic, keyword-stuffed pages that game ranking algorithms but provide no genuine informational value.
-
Opaque, centralized gatekeeping. A small number of corporations control which content is discoverable. Publishers have no verifiable, permissionless mechanism to signal content quality or earn ranking on merit.
LLMs and AI agents are rapidly becoming the primary consumers of web information. Unlike human users, these machine consumers do not click ads, do not respond to engagement bait, and require structured, accurate, and citation-worthy content. They need infrastructure — not advertisements.
OpenSonarX introduces an economic primitive — stake-weighted search — to align incentives across publishers, curators, quality enforcers, and AI consumers:
- Publishers stake $TRUTH tokens on their domains, creating a verifiable economic bond. Quality content earns staking rewards; spam risks slashing.
- Sentinels monitor content quality, file disputes against bad actors, and earn protocol fees for enforcement.
- AI Agents pay for search results through state channels, with a portion of every payment burned to make spam economically irrational.
- Governance is on-chain, with all protocol parameters adjustable by token-weighted voting subject to timelocks and quorum requirements.
The core thesis: quality content is profitable; spam is unprofitable. The protocol enforces this through staking, slashing, burning, and decayed emissions that transition the network from subsidy-driven growth to a self-sustaining fee economy over approximately three years.
┌─────────────────────────────────────────────────────────────────┐
│ AI Agent / LLM │
│ Queries the network, pays via state channels │
└──────────────────────────┬──────────────────────────────────────┘
│ HTTPS / libp2p
▼
┌─────────────────────────────────────────────────────────────────┐
│ OpenSonarX Gateway │
│ REST API · Magic-Link Auth · Quota · Billing · Blended Search│
└──────────┬──────────────────────────────────┬───────────────────┘
│ │
┌─────▼──────┐ ┌──────▼──────┐
│ oi-sdk │ │ oi-network │
│ Core SDK │ │ libp2p P2P │
│ + Gateway │ │ Kad + Gossip│
│ + Auth │ │ + Fanout │
│ ┌─────────┐ │ └─────────────┘
│ │oi-index │ │
│ │HNSW+BM25│ │
│ │+Reranker│ │
│ │+SimHash │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │oi-embed │ │
│ │MiniLM │ │
│ │Arctic M/L│ │
│ │CLIP+Rnk │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │oi-crawl │ │
│ │Distiller│ │
│ │+Spider │ │
│ │+Daemon │ │
│ └─────────┘ │
│ ┌─────────┐ │
│ │oi-facts │ │
│ │GLiNER │ │
│ └─────────┘ │
└──────┬──────┘
│ Solana RPC
▼
┌─────────────────────────────────────────────────────────────────┐
│ Solana Blockchain │
│ $TRUTH Token · Staking Program · Disputes · Governance │
│ State Channels · Sentinel Registry · Emission Controller │
└─────────────────────────────────────────────────────────────────┘
Figure 1. High-level protocol architecture. AI agents query the gateway or P2P network directly. The SDK orchestrates hybrid search, embedding, entity extraction, and crawling. The gateway client provides REST access with magic-link authentication and quota management. All economic state (staking, disputes, governance, payments) is settled on Solana.
OpenSonarX is implemented as a Rust workspace with nine crates, each responsible for a single concern:
| Crate | Responsibility |
|---|---|
| oi-types | Shared types, traits, error codes (E1000–E6012). All other crates depend on this. |
| oi-index | HNSW (vector) + BM25 (keyword) hybrid search engine with multi-pass ranking, SimHash near-duplicate detection, cross-encoder reranking, and result diversity enforcement. |
| oi-embed | Embedding pipelines: MiniLM (Snowflake Arctic Embed S, 384-dim, INT8 ONNX), Snowflake Arctic Embed M and L (feature-gated), CLIP (512-dim, feature-gated), cross-encoder reranker (feature-gated), and batch inference. |
| oi-facts | Entity and fact extraction via GLiNER (ONNX, feature-gated). Extracts named entities and structured facts at crawl time for entity-boosted ranking and structured search. |
| oi-crawl | Content distillation pipeline: fetcher, HTML→Markdown extractor, quality gate (spam + slop detection), chunker, embedder. Includes a BFS spider, priority-based URL frontier, content drift detection, RSS/Atom/sitemap discovery engines, platform adapters (YouTube, Reddit, etc.), and a sentinel crawl daemon for continuous background indexing. |
| oi-network | libp2p networking: Kademlia DHT, gossipsub pub/sub, custom request-response query protocol, distributed query fanout with local+remote result merging. |
| oi-staking | Solana Anchor program: stake, unstake, disputes, jury voting, governance, state channels, emissions. |
| oi-sdk | SDK core that wires index + embedder + staking + crawl into a unified client. Includes a REST gateway client, magic-link authentication, and a query leaderboard. |
| oi-cli | Command-line binary (oi): search, stake, sentinel, seed (docs-rs, MDN, custom sitemaps), wallet, dispute, governance, feedback, network (P2P node with HTTP API). |
A search query traverses the following path:
Query ("rust async programming")
│
▼
[1] Embedding ─── MiniLM (384-dim, INT8 quantized)
│ Query prefix: "Represent this sentence for
│ searching relevant passages: "
▼
[2] Hybrid Search
│ ├── HNSW vector search (ef_search=50, pool=top_k×2)
│ └── BM25 keyword search (k1=1.2, b=0.75)
│
▼
[3] Score Fusion ─── Weighted sum: 0.7·semantic + 0.3·BM25
│ (or Reciprocal Rank Fusion, k=60)
▼
[4] Filter Pass ─── content_type, freshness, entities, site
│
▼
[5] Cross-Encoder Rerank (optional)
│ └── Reranker rescores top candidates using full query-document
│ attention (when a Reranker is configured)
│
▼
[6] Diversity Enforcement
│ └── Max 3 results per domain, max 2 per URL
│
▼
[7] Stake & Reputation Enrichment
│ ├── Batch lookup domain/entity stakes from Solana
│ ├── Wilson score from accumulated feedback
│ └── UGC platform passthrough = 0 (entity-only staking)
│
▼
[8] Final Ranking ─── score = base × stake_boost × reputation
│ × quality × freshness × dns
▼
[9] Response ─── Ranked results with scores, stake info,
content hashes, provenance metadata, and
per-stage timing breakdown
Figure 2. Query processing pipeline from embedding through multi-pass ranking.
Pure vector search captures semantic meaning but misses exact keyword matches. Pure BM25 captures lexical relevance but fails on synonyms and paraphrases. OpenSonarX combines both in a hybrid architecture that empirically outperforms either method alone.
Parameter sweep results on the BEIR benchmark showed that a 70/30 semantic-to-BM25 weighting achieved the best Recall@10 among tested configurations.
The vector index implements Hierarchical Navigable Small World (HNSW) graphs with the following parameters:
| Parameter | Value | Description |
|---|---|---|
| M | 16 | Maximum bidirectional connections per node per layer |
| M_max0 | 32 | Maximum connections on the ground layer (layer 0) |
| ef_construction | 200 | Beam width during index construction |
| ef_search | 50 | Beam width during query-time search |
| MAX_LEVEL | 16 | Maximum number of hierarchical layers |
| level_mult | 1/ln(M) | Probabilistic level assignment multiplier |
Level assignment for each new node follows a geometric distribution:
where M = 16 and the expected number of layers scales logarithmically with the corpus size.
The keyword index implements Okapi BM25 with standard parameters:
where:
Parameters: k₁ = 1.2, b = 0.75. Tokenization: lowercase, split on non-alphanumeric boundaries, filter tokens with length ≤ 1.
Two fusion methods are supported:
Weighted Sum (default):
where ŝ denotes min-max normalized scores, w_s = 0.7, w_b = 0.3.
Reciprocal Rank Fusion (RRF):
where k = 60 (smoothing constant) and r denotes the rank position in each retrieval list.
| Property | Value |
|---|---|
| Model | Snowflake Arctic Embed S |
| Dimensions | 384 |
| Parameters | 33M |
| Quantization | INT8 (ONNX Runtime) |
| Pooling | CLS token |
| Max sequence length | 256 tokens |
| nDCG@10 (BEIR) | 51.98 |
| Query prefix | "Represent this sentence for searching relevant passages: " |
Additional Embedding Models (feature-gated):
| Model | Feature Flag | Dimensions | Use Case |
|---|---|---|---|
| Snowflake Arctic Embed M | arctic-m |
768 | Higher-quality retrieval for larger indexes |
| Snowflake Arctic Embed L | arctic-l |
1024 | Maximum retrieval quality |
| CLIP | clip |
512 (projected to 384) | Multimodal text + image unified search |
All ONNX models share a configurable thread pool (default: min(4, available_cores), overridable via OI_THREADS env var).
When a cross-encoder reranker is configured (feature flag reranker), the top candidates from score fusion are rescored using full query-document attention. Unlike bi-encoder embeddings (which encode query and document independently), the cross-encoder jointly attends to both, producing more accurate relevance scores at the cost of higher latency. Reranking is applied after fusion and filtering but before stake enrichment, and its execution time is tracked in the per-query timing breakdown.
At ingest time, the index performs two levels of duplicate detection:
- Exact deduplication: SHA-256 content hashes reject byte-identical documents.
- Near-duplicate detection: 64-bit SimHash fingerprints computed from character trigrams. Two documents with Hamming distance ≤ 8 (out of 64 bits) are considered near-duplicates and rejected. The
SimHashIndexuses band-based Locality-Sensitive Hashing (LSH) for O(1) average-case duplicate lookups rather than O(n) linear scan.
To prevent a single source from dominating results, the index enforces per-query diversity limits:
- MAX_PER_DOMAIN = 3 — at most 3 results from any single domain.
- MAX_PER_URL = 2 — at most 2 results from any single URL (prevents one large page, e.g., release notes, from consuming all domain slots).
URL Queue (BFS Frontier)
│
▼
[1] Fetcher ─── HTTP client, robots.txt compliance
│
▼
[2] Extractor ─── HTML → Markdown, date parsing, content type inference
│
▼
[3] Quality Gate ─── Spam detection + AI slop detection + length checks
│ Reject if spam_score > 0.7 or slop_score > 0.7
▼
[4] Hasher ─── SHA-256 of markdown content (deduplication, provenance)
│
▼
[5] Chunker ─── Token-based splitting: max_tokens=512, overlap=128
│
▼
[6] Embedder ─── MiniLM 384-dim INT8 vectors per chunk
│
▼
[7] Indexer ─── Insert into HNSW + BM25 hybrid index
Figure 3. Content distillation pipeline from URL to indexed, searchable chunks.
The quality gate implements three independent detectors that produce scores in [0, 1]. Content is rejected if any score exceeds 0.7.
Spam Detection (three averaged signals):
- Keyword density: If any single non-stopword exceeds 10% frequency:
- Trigram repetition: If any trigram appears more than 3 times:
- Capitalization ratio: If uppercase characters exceed 30%:
AI Slop Detection (three averaged signals):
- Formulaic patterns: A dictionary of 20 known LLM-generation markers ("in today's rapidly evolving", "game-changer", "paradigm shift", "synergy", etc.):
- Vocabulary diversity: Ratio of unique words to total words:
- Sentence length uniformity: Coefficient of variation of sentence lengths:
Quality Score:
where w = word count. Content quality profiles impose minimum character lengths (Standard: 100, SocialPost: 20, VideoDescription: 30) and maximum link ratios (80%, 95%, 90% respectively).
The crawl pipeline supports three discovery methods for finding new URLs to crawl:
- RSS/Atom feeds — Parses both RSS 2.0 and Atom feeds to discover new content from subscribed sources. Feed entries include publication dates for freshness-aware scheduling.
- Sitemap parsing — Extracts URLs from
sitemap.xmlfiles, including<lastmod>timestamps for change detection. - Link extraction — Follows same-domain HTML links discovered during page crawling.
Discovery source affects crawl priority: Feed URLs receive the highest priority (3×), followed by Sitemap (2×), then Link-discovered URLs (1×).
URLs are scheduled for crawling via a priority queue (the frontier) that orders URLs by a composite priority score incorporating:
- Domain stake — higher-staked domains are crawled first.
- Discovery source — feeds and sitemaps take precedence over links.
- Recency — recently added URLs are prioritized within the same priority tier.
The frontier enforces per-domain rate limits and is optionally backed by SQLite for persistence across restarts.
For user-generated content (UGC) platforms, platform-specific adapters implement a PlatformAdapter trait that handles:
- Entity extraction — Mapping URLs to entity references (e.g., YouTube channels, subreddits).
- Content discovery — Fetching content via platform-specific RSS feeds, JSON APIs, or HTML scraping.
- Structured extraction — Producing normalized
CrawlOutputregardless of the source platform.
Adapters exist for YouTube, Reddit, and other major UGC platforms. UGC content receives zero domain-stake passthrough (only entity-level stakes earn rewards), as documented in Section 7.
The Spider performs one-shot site crawling via breadth-first search from seed URLs. Configurable parameters:
| Parameter | Default | Description |
|---|---|---|
max_depth |
3 | Maximum link-follow depth from seed URLs |
max_pages |
1,000 | Maximum pages to crawl per spider run |
concurrency |
4 | Number of concurrent crawl workers |
respect_robots |
true | Whether to obey robots.txt |
same_domain_only |
true | Only follow links on the same domain |
The sentinel daemon is a continuous background process that discovers and indexes content from staked domains. It:
- Polls RSS/Atom feeds and homepage links on configurable intervals.
- Enforces per-domain and per-epoch crawl budgets (
max_urls_per_domain,max_urls_per_epoch). - Emits
CrawlEventmessages (ContentDiscovered,ContentUpdated) for the network layer to broadcast via gossip.
The daemon detects when previously crawled content has changed:
- Re-fetches pages periodically and computes new SHA-256 hashes.
- Compares against stored hashes; if different, compares embedding vectors via cosine similarity.
- Pages with significant content drift are re-indexed and a
ContentUpdatedevent is broadcast.
When the oi-facts crate is enabled (feature-gated behind onnx), the pipeline extracts named entities and structured facts from crawled content using a GLiNER model (ONNX Runtime). Extracted entities are:
- Normalized and deduplicated.
- Stored alongside chunk metadata for entity-boosted ranking (see Section 12).
- Used for structured search filters (e.g., filtering results by entity type or name).
| Layer | Technology |
|---|---|
| Transport | TCP |
| Encryption | Noise (XX handshake) |
| Multiplexing | Yamux |
| Identity | Ed25519 keypairs |
| Discovery | Kademlia DHT (memory-backed) |
| Pub/Sub | Gossipsub |
| Query | Custom request-response (/opensonarx/query/1.0.0) |
Nodes subscribe to three gossip topics for protocol coordination:
/opensonarx/heartbeat— Node liveness, shard counts, query throughput, uptime percentage (30-day rolling)./opensonarx/content-announce— Sentinel announces new crawled content (URL, domain, SHA-256 hash, chunk count, shard ID)./opensonarx/stake-events— On-chain stake/unstake/slash events broadcast for local cache invalidation.
Distributed search uses the /opensonarx/query/1.0.0 request-response protocol:
┌──────────┐ QueryRequest (protobuf) ┌──────────┐
│ Client │ ──────────────────────────────► │ Node │
│ │ │ │
│ │ ◄────────────────────────────── │ │
└──────────┘ QueryResponse (protobuf) └──────────┘
+ StateChannelTicket
QueryRequest carries a 384-byte INT8 embedding, filters, and an optional signed state channel ticket for payment. The response includes ranked results with full provenance metadata (content hashes, stake info, extraction quality).
When a node receives a query it cannot fully answer from its local index (e.g., it holds only a subset of shards), the query is fanned out to remote peers:
- The node executes the query against its local HNSW+BM25 index.
- It simultaneously forwards the
QueryRequestto peers known to hold relevant shards (discovered viaShardAnnouncegossip). - Remote peers return their local
QueryResponseresults. - The originating node merges local and remote results, deduplicates, and re-ranks the combined set before returning the final response.
The PendingFanout struct tracks in-flight distributed queries, including expected/received remote responses and a creation timestamp for timeout handling.
Large indexes are sharded across nodes. The ShardAnnounce gossip message advertises which shards a node holds, its vector count, and its last sync block. The ReplicationReq/ReplicationResp protocol enables bulk shard transfer with per-document chunks (content, INT8 embedding, metadata, content hash, sequence number). Each ingested chunk receives a monotonically increasing sequence number for incremental replication — peers can request only chunks newer than their last sync point.
The protocol defines three staking roles with distinct incentives and risk profiles:
Publishers stake $TRUTH on domains they control (e.g., example.com). This stake serves as an economic bond vouching for content quality.
- Earns: Pro-rata share of 70% of epoch emissions.
- Risk: Graduated slashing if the domain is disputed and found guilty.
- Constraint: Minimum stake enforced (
config.min_stake). DNS verification available for enhanced ranking (+5% bonus). - Lock: Cannot unstake during active disputes. 7-day cooldown after initiating unstake.
Curators stake on domains they do not own, acting as decentralized quality signals.
- Earns: Pro-rata share of staker emissions, capped at 15% APY (1500 bps) to prevent whale-gaming of the emission pool.
- Risk: Same slashing exposure as publishers on their staked domains.
- UGC Platforms: Curators receive zero passthrough on user-generated content platforms (YouTube, Reddit, X, etc.). Only entity-level stakes (e.g., individual channels) earn rewards.
Sentinels are quality enforcement agents. They stake into a global sentinel pool (not domain-specific) and earn rewards for monitoring the network.
- Earns: 30% of epoch emissions (pro-rata by sentinel stake) + 5% of state channel settlement fees + flat jury vote rewards.
- Powers: Can file disputes against domains, triggering commit-reveal jury voting.
- Trusted Reporters: Sentinels with stake ≥ 10×
jury_min_stakereceive a 50% discount on dispute bonds. - Constraint: Minimum stake of
config.jury_min_stake.
All staking state is managed by a Solana Anchor program. Key program-derived addresses (PDAs):
| PDA | Seeds | Purpose |
|---|---|---|
| ProtocolConfig | ["config"] |
Global protocol parameters, emission state, reward indices |
| StakeAccountV2 | ["stake", domain, staker] |
Per-staker-per-domain position |
| DomainRecord | ["domain", domain] |
Aggregate domain state (total staked, publisher/curator counts, blacklist flag) |
| SentinelAccount | ["sentinel", pubkey] |
Per-sentinel stake position |
| StateChannelAccount | ["channel", payer, payee] |
Bidirectional payment channel |
| DisputeAccount | ["dispute", domain, nonce] |
Active dispute state, jury votes, severity |
| GovernanceProposal | ["proposal", nonce] |
DAO proposal with vote tallies and timelock |
Rewards are distributed via a global reward index pattern (similar to Synthetix StakingRewards), using 10^18 fixed-point precision to avoid rounding errors:
Each staker's claimable reward is:
where debt_i is set to the current R_per_token at the time of staking or last claim. An analogous index exists for sentinels.
UpdateRewardIndex (permissionless, anyone can call)
│
│ Caps at: min(requested, distributable, epoch_budget, supply_headroom)
│
├──── 70% ────► Staker Reward Index
│ Divided proportionally by total_staked
│
└──── 30% ────► Sentinel Reward Index
+ fees Divided proportionally by total_sentinel_staked
Safety: If no stakers or sentinels exist, emissions are not counted against the supply cap — they remain available for later distribution.
┌─────────────┐
│ Sentinel │
│ files │──── Posts dispute_bond as collateral
│ dispute │ (50% discount for trusted reporters)
└──────┬──────┘
│
▼
┌─────────────────────────────────┐
│ Commit Phase │
│ Jurors submit hash(vote|salt) │
│ Eligibility: stake ≥ min + │
│ 7-day stake age │
└──────┬──────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ Reveal Phase │
│ Jurors reveal vote + salt │
│ Must match committed hash │
└──────┬──────────────────────────┘
│
▼
┌─────────────────────────────────┐
│ Resolution │
│ Quorum check: total jury │
│ weight ≥ min_jury_weight │
│ (default: 1,000 $TRUTH) │
└──────┬──────────────────────────┘
│
├── GUILTY ──────────────────────────────────────┐
│ │
│ Graduated Slashing: │
│ Low severity: 50% × slash_pct │
│ Medium severity: 100% × slash_pct │
│ High severity: 150% × slash_pct │
│ │
│ Slashed tokens: │
│ 50% burned (deflationary) │
│ 50% to protocol revenue vault │
│ │
│ Reporter: full bond returned │
│ │
└── INNOCENT ────────────────────────────────────┐
│
Reporter loses bond (deters spam disputes) │
Bond stays in vault as protocol revenue │
Domain stakers unaffected │
Figure 4. Dispute resolution flow with commit-reveal jury voting.
Jury rewards are designed to be neutral — jurors earn a flat fee (jury_vote_reward, minted from supply) per winning vote, regardless of the verdict outcome. This eliminates profit motive from predatory slashing.
- Per-epoch jury reward cap:
max_jury_rewards_per_epoch(default: 50,000 $TRUTH) - This prevents jury farming through fabricated disputes.
Stakers cannot unstake during an active dispute against their domain (governance_lock_until set to dispute deadline). This prevents front-running slashing by withdrawing early.
AI agents pay for search results through bidirectional state channels on Solana, enabling high-throughput micropayments without per-query on-chain transactions.
AI Agent (payer) ◄──── state channel ────► Publisher Node (payee)
OpenChannel: Payer deposits $TRUTH into channel PDA
Queries: Off-chain signed tickets (monotonic nonce, amount, expiry)
SettleChannel: Either party submits final ticket on-chain
On settlement, the channel payment is split:
| Allocation | Percentage | Purpose |
|---|---|---|
| Burned | 10% (configurable via burn_pct) |
Deflationary pressure; makes spam unprofitable |
| Sentinel fee | 5% (configurable via sentinel_fee_pct) |
Accumulated into sentinel reward index |
| Payee | Remainder (85%) | Publisher/node operator revenue |
Why burn? AI agents pay for search results. Burning a portion of every payment ensures that spam operators spend more than they earn — the cost of staking plus the burn on settlements exceeds any revenue from serving low-quality content.
message StateChannelTicket {
bytes payer_pubkey = 1; // 32 bytes Ed25519
bytes payee_pubkey = 2; // 32 bytes Ed25519
uint64 amount_lamports = 3; // cumulative spend
uint64 nonce = 4; // monotonically increasing
bytes signature = 5; // Ed25519 over fields 1-4
uint64 expiry_slot = 6; // Solana slot deadline
}Replay protection is enforced by the monotonic nonce — the on-chain program only accepts tickets with a nonce strictly greater than the last settled nonce.
[1] Create Proposal
│ Proposer deposits proposal_deposit (100 $TRUTH default)
│ Specifies: param_key, param_value, description_hash
│
▼
[2] Voting Period (24 hours)
│ Vote weight = staked amount
│ Eligibility: 7-day stake age (anti-flash-loan)
│ Voting locks ALL positions (per-wallet lock)
│
▼
[3] Execution Timelock (24 hours default, min 1 hour)
│ Rage-quit window: stakers can exit before changes take effect
│
▼
[4] Execution
│ Parameter updated on-chain
│ Proposal deposit refunded to proposer
│
[OR]
│
▼
[4'] Failure / Cancellation
Deposit forfeited (anti-spam)
Figure 5. Governance proposal lifecycle.
All critical protocol parameters are adjustable through governance, subject to bounded ranges that prevent zeroing safety mechanisms:
| Parameter | Default | Min | Max | Description |
|---|---|---|---|---|
burn_pct |
10% | 0% | 100% | Channel settlement burn rate |
sentinel_fee_pct |
5% | 0% | 50% | Channel fee to sentinels |
slash_pct |
configurable | 5% | 100% | Base slash rate (graduated by severity) |
staker_emission_pct |
70% | 0% | 100% | Staker share of emissions |
curator_yield_cap_bps |
1500 | 0 | 10000 | 15% APY cap for curators |
execution_delay_secs |
86,400 | 3,600 | 2,592,000 | Timelock: 24h default (min 1h, max 30d) |
proposal_deposit |
100 | 0 | 1B | Anti-spam deposit |
min_jury_weight |
1,000 | 0 | 10B | Minimum jury quorum |
max_total_supply |
1,000,000,000 | ≥ total_minted | — | Hard supply cap |
jury_vote_reward |
1,000 | 0 | 1B | Flat reward per winning juror |
max_jury_rewards_per_epoch |
50,000 | 0 | 1B | Per-epoch jury reward budget |
min_epoch_duration |
604,800 | 0 | 31,536,000 | 7 days between epoch advances |
- 7-day stake age for voting — prevents flash-loan governance attacks where an attacker borrows tokens, votes, and returns them in the same block.
- 24-hour execution timelock (minimum 1 hour, cannot be zeroed) — gives the community time to react to malicious proposals and exit positions.
- Per-wallet governance lock — voting locks all staking positions until the vote period ends, preventing vote-then-dump strategies.
- Proposal deposit (100 $TRUTH, forfeited on cancellation) — prevents spam proposals.
- Parameter floors — critical safety parameters (slash_pct ≥ 5%, execution_delay ≥ 1h) cannot be zeroed through governance.
$TRUTH has a hard maximum supply of 1,000,000,000 tokens (1 billion), enforced on every mint instruction. This cap is itself governable (can be raised or lowered, but never below total_minted).
Emissions follow a geometric decay of 1.5% per epoch (minimum 7-day epochs):
where E₀ = 10,000,000 $TRUTH and δ = 0.015 (1.5% decay per epoch).
| Epoch | Time | Max Emission per Epoch |
|---|---|---|
| 0 | Week 0 | 10,000,000 |
| 1 | Week 1 | 9,850,000 |
| 52 | ~Year 1 | ~4,557,000 |
| 104 | ~Year 2 | ~2,077,000 |
| 156 | ~Year 3 | ~946,000 |
| 208 | ~Year 4 | ~431,000 |
| ∞ | — | → 0 |
The total emitted supply after n epochs (assuming full distribution each epoch):
Theoretical maximum cumulative emission (n → ∞):
This leaves 333,333,333 $TRUTH of supply headroom under the 1B cap for treasury, grants, and future governance decisions.
Two mechanisms create deflationary pressure that counteracts emissions:
- Channel settlement burn: 10% of every AI agent payment is burned permanently.
- Slash burn: 50% of slashed tokens from guilty dispute verdicts are burned.
The protocol transitions from emission-driven (early) to fee-driven (mature) economics over approximately 3 years:
Phase 1 (Year 0-1): High emissions attract stakers → builds index quality
Low burn rate (10%) attracts publishers
5% sentinel fee bootstraps quality enforcement
Phase 2 (Year 1-2): Decaying emissions create scarcity → token appreciates
Governance can raise burn/sentinel fees as network grows
Phase 3 (Year 2-3): Channel fees sustain sentinels → quality maintained
Emissions become marginal
Phase 4 (Year 3+): Zero-emission, fully fee-driven economy
Ongoing burns maintain deflationary pressure
The final ranking score for a search result d against query q is computed as:
where each factor is defined below.
with w_s = 0.7, w_b = 0.3, and hat denoting min-max normalization within the candidate set.
capped at B_stake ≤ 3.0, where:
The logarithmic scaling ensures diminishing returns — doubling stake provides only a marginal ranking boost, discouraging whales from dominating results purely through capital.
Derived from the Wilson score lower bound of accumulated feedback signals (positive/negative), or from stake-derived reputation as a fallback.
where q_score, s_spam, s_slop ∈ [0, 1]. High-quality, non-spam, non-slop content achieves Q ≈ 1.2. Spammy content can be driven to Q ≈ 0.
| Threat | Mitigation |
|---|---|
| Spam flooding | Minimum stake requirement + quality gate (spam/slop detection) + slashing risk |
| Stake-and-dump | 7-day unstake cooldown + dispute lock (cannot unstake during active dispute) |
| Flash-loan governance | 7-day stake age requirement for voting eligibility |
| Malicious proposals | 24-hour execution timelock + parameter floors + proposal deposit |
| Vote manipulation | Commit-reveal jury voting prevents last-minute vote swinging |
| Predatory slashing | Flat jury rewards (no profit from verdict outcome) + dispute bond at risk |
| Jury farming | Per-epoch jury reward cap (max_jury_rewards_per_epoch) |
| Single-juror verdicts | Minimum jury quorum (min_jury_weight = 1,000 $TRUTH) |
| Whale emission gaming | Curator yield cap (15% APY) + UGC passthrough = 0 |
| Phantom rewards | Supply headroom cap prevents accumulating unmintable reward debt |
| Payment spam | Channel settlement burn (10%) makes spam queries unprofitable |
| Sybil attacks | Stake-weighted participation; cost of attack scales linearly with capital |
-
Spam unprofitability: For any spammer staking S tokens, the expected loss from slashing (probability p × graduated rate × S) plus settlement burns exceeds the expected revenue from serving low-quality results.
-
Sentinel incentive compatibility: Sentinels earn flat fees regardless of verdict, removing incentive for frivolous or predatory disputes. False reporting costs the dispute bond.
-
Governance safety: The execution timelock ensures that even if a malicious proposal passes, affected stakers have time to exit. Parameter floors prevent disabling critical safety mechanisms.
-
Deflationary convergence: As emissions decay to zero, the protocol converges to a steady state where burns from channel settlements and slashing provide ongoing deflationary pressure, while sentinel fees sustain quality enforcement.
| Phase | Milestone | Status | Description |
|---|---|---|---|
| Phase 0 | Local Stack | Complete | In-memory gateway, mock staking, hybrid search (HNSW+BM25), CLI, content distillation pipeline, quality gate. |
| Phase 1 | Embedding & Retrieval | Complete | MiniLM INT8 embeddings, batch inference, SimHash dedup, cross-encoder reranking, Arctic M/L model support, CLIP multimodal (feature-gated). |
| Phase 2 | Crawl Infrastructure | Complete | BFS spider, sentinel crawl daemon, URL frontier with stake-weighted priority, RSS/Atom/sitemap discovery, platform adapters, content drift detection, SQLite persistence. |
| Phase 3 | Solana Program | Complete | Anchor staking program with stake/unstake, disputes, commit-reveal jury voting, governance, state channels, emissions, sentinel registry. All PDAs and instructions implemented. |
| Phase 4 | P2P Network | Complete | libp2p mesh (Kademlia + gossipsub), distributed query fanout, shard replication with incremental sync, content/stake/heartbeat gossip. |
| Phase 5 | SDK & Gateway | Complete | REST gateway client, magic-link authentication, query leaderboard, content seeding (docs-rs, MDN, custom sitemaps), entity/fact extraction (GLiNER). |
| Phase 6 | Mainnet Beta | Planned | $TRUTH token launch, emission schedule activation, sentinel onboarding, production deployment. |
| Phase 7 | Fee Economy | Planned | Transition from emission-driven to fee-driven sustainability. Community governance activation. |
OpenSonarX introduces a new paradigm for web search — one designed for machines rather than humans, and for truth rather than advertising. By requiring publishers to stake economic value, enforcing quality through decentralized sentinel networks, and aligning all participants through carefully designed token mechanics, the protocol creates a search layer where accuracy is profitable and spam is punished.
The 1.5% weekly emission decay provides a ~3-year runway to bootstrap network effects, while the burn-on-settlement and slash-and-burn mechanisms ensure long-term deflationary pressure. Governance is fully on-chain, with multiple layers of protection against flash-loan attacks, malicious proposals, and parameter manipulation.
As LLMs and AI agents become the dominant consumers of web information, the need for a search infrastructure layer that is accurate, verifiable, and economically aligned with quality has never been greater. OpenSonarX is that infrastructure.
| Parameter | Default | Range | Notes |
|---|---|---|---|
max_total_supply |
1,000,000,000 | ≥ total_minted | Hard cap on mintable $TRUTH |
epoch_max_emission (E₀) |
10,000,000 | — | Initial epoch emission |
emission_decay (δ) |
1.5% | — | Per-epoch geometric decay |
min_epoch_duration |
604,800s (7d) | 0–365d | Minimum time between epoch advances |
burn_pct |
10% | 0–100% | Channel settlement burn |
sentinel_fee_pct |
5% | 0–50% | Channel fee to sentinels |
slash_pct |
configurable | 5–100% | Base slash rate |
staker_emission_pct |
70% | 0–100% | Staker share of emissions |
curator_yield_cap_bps |
1500 | 0–10000 | 15% APY cap for curators |
cooldown_secs |
604,800 (7d) | — | Unstake cooldown period |
execution_delay_secs |
86,400 (24h) | 3,600–2,592,000 | Governance timelock |
proposal_deposit |
100 | 0–1B | Anti-spam deposit |
min_jury_weight |
1,000 | 0–10B | Minimum jury quorum |
jury_vote_reward |
1,000 | 0–1B | Flat reward per winning juror |
max_jury_rewards_per_epoch |
50,000 | 0–1B | Per-epoch jury reward cap |
semantic_weight (w_s) |
0.7 | — | Hybrid search: semantic weight |
bm25_weight (w_b) |
0.3 | — | Hybrid search: BM25 weight |
ef_search |
50 | — | HNSW query beam width |
M |
16 | — | HNSW max connections per layer |
ef_construction |
200 | — | HNSW build beam width |
k1 |
1.2 | — | BM25 term frequency saturation |
b |
0.75 | — | BM25 document length normalization |
max_tokens |
512 | — | Chunk size (tokens) |
overlap |
128 | — | Chunk overlap (tokens) |
pool_multiplier |
2 | — | HNSW candidate pool: top_k × this |
MAX_PER_DOMAIN |
3 | — | Max results per domain per query |
MAX_PER_URL |
2 | — | Max results per URL per query |
simhash_threshold |
8 | — | Hamming distance for near-duplicate detection |
spider_max_depth |
3 | — | BFS spider max link-follow depth |
spider_max_pages |
1,000 | — | BFS spider max pages per crawl |
spider_concurrency |
4 | — | BFS spider concurrent workers |
OI_THREADS |
min(4, cores) | — | ONNX intra-op thread count (env var) |
| Range | Category | Codes |
|---|---|---|
| E1000–E1004 | Query | No results, timeout, shard unavailable, partial results, invalid filter |
| E2000–E2003 | Payment | Insufficient balance, ticket rejected, ticket expired, settlement failed |
| E3000–E3009 | Staking | Domain not verified, already staked, cooldown active, below minimum, blacklisted, commit/reveal/voting errors |
| E4000–E4003 | Dispute | Already active, bond insufficient, reporter cooldown, jury not eligible |
| E5000–E5002 | Network | No peers, DHT lookup failed, node unreachable |
| E6000–E6012 | Auth/Gateway | Invalid wallet/key/signature, quota exceeded, session/magic-link expired, email unverified, free tier exhausted, Stripe/fiat bond errors |
Retryable errors: E1001 (timeout), E1002 (shard unavailable), E2001–E2002 (ticket), E2003 (settlement), E5000–E5002 (network), E6007 (free tier exhausted).
The OpenSonarX wire protocol uses Protocol Buffers (protobuf) via prost for all P2P communication. Three protocol streams are defined:
| Protocol | Path | Direction | Purpose |
|---|---|---|---|
| Query | /opensonarx/query/1.0.0 |
Request-Response | Distributed search queries |
| Feedback | /opensonarx/feedback/1.0.0 |
Request-Response | Relevance feedback signals |
| Gossip | /opensonarx/gossip/1.0.0 |
Pub/Sub | Content announcements, heartbeats, stake events, shard metadata |
Key message types:
- QueryRequest: request_id (UUID), embedding (384 bytes INT8), query text, filters, limit, state channel ticket, nonce.
- QueryResponse: request_id, metadata (latency, shard, node version), ranked results with full provenance.
- SearchResult: result_id, rank, title, URL, snippet (200 chars), content markdown, scores (semantic, BM25, final), stake info (amount, USD, type, reputation), utility info (agent score, feedback counts), freshness info (crawled/published timestamps, content hash), extraction quality, source attribution.
- StateChannelTicket: payer/payee pubkeys, cumulative amount, monotonic nonce, Ed25519 signature, expiry slot.
- ContentAnnouncement: sentinel_id, URL, domain, content SHA-256, crawled_at, chunk count, shard_id.
- Heartbeat: node_id, timestamp, shard count, vectors stored, queries served, average latency, uptime percentage.
OpenSonarX is open source under the MIT License. Repository: https://github.com/bbiangul/opensonarx