Embedded, SQLite-flavored search engine with a single on-disk index and an ergonomic Rust API, CLI, and optional C FFI.
Crates
searchlite-core: indexing, storage, and retrieval (BM25 + block-level maxima, boolean/phrase matching, filters, optional vectors/GPU rerank stubs).searchlite-cli: CLI for init/add/commit/search/inspect/compact.searchlite-ffi: optional C ABI (enable with theffifeature).searchlite-wasm: experimental wasm bindings with an IndexedDB-backedStorageimplementation (threaded wasm needswasm-bindgen-rayon; you must configure COOP/COEP yourself).
Core capabilities
- Single-writer, multi-reader index backed by a WAL and atomic manifest updates.
- BM25 scoring (
k1=0.9,b=0.4by default) with phrase matching and configurable multi-field highlighting. - Block-level max scores per term (WAND/BMW pruning) for faster exact top-k.
- Filesystem-backed by default; toggle to an in-memory index for ephemeral workloads.
- Stored/fast fields for filters and snippets; optional
vectors,gpu,zstd, andffifeature flags.
Durability
- Segment files, docstores, manifests, and vector indexes are fsync’d on write; the WAL is truncated only after those files are flushed and the manifest is synced.
- Crash window: if the process dies after the manifest is persisted but before the WAL is truncated, WAL replay will reapply the last batch, creating an extra generation for the same docs (no data loss; compaction cleans it up).
- Manifest writes use atomic rename plus directory fsync; in-memory storage skips fsync as it is meant for ephemeral use.
Prebuilt binaries are published on every GitHub release:
curl -fsSL https://searchlite.dev/install | sh- Set
SEARCHLITE_VERSIONto pin a tag (e.g.,v0.4.0),SEARCHLITE_INSTALL_DIRto override the destination, andSEARCHLITE_BIN_NAMEto change the installed name (defaults tosearchlite). - Supported targets match release artifacts:
x86_64/aarch64for Linux/macOS, and Windows via Git Bash/WSL (downloads the.zipand installssearchlite.exe). - The script downloads from GitHub releases (
searchlite-cli-<target>.{tar.gz,zip}) and falls back to~/.local/binif/usr/local/binis not writable.
Easiest way to try searchlite: run the published container, mounting a local data directory and exposing the HTTP API on port 8080.
docker run --rm -p 8080:8080 -v "$PWD:/data" ghcr.io/davidkelley/searchlite:latest http --index /data --bind 0.0.0.0:8080- Rust toolchain is pinned to
1.92.0(rust-toolchain.toml); installrustfmt/clippyif missing. - Build everything with
cargo build --all --all-features(orjust build). - Code quality:
cargo fmt --all,cargo clippy --all --all-features -- -D warnings. - The CLI runs directly from the workspace:
cargo run -p searchlite-cli -- <subcommand> <index> ...(e.g.,cargo run -p searchlite-cli -- init /tmp/idx schema.json).
- Tests:
cargo test --all --all-features(orjust test). - Benches (Criterion):
cargo bench -p searchlite-core(orjust bench). - Smoketest the CLI by running a small end-to-end flow (see examples below).
Schema lives in schema.json (example below). Text fields control analysis pipelines and storage, keyword/numeric fields support filters and fast-field access.
{
"doc_id_field": "_id",
"analyzers": [
{
"name": "english",
"tokenizer": "default",
"filters": [{ "stopwords": "en" }, { "stemmer": "english" }]
},
{
"name": "title_prefix",
"tokenizer": "default",
"filters": [{ "edge_ngram": { "min": 1, "max": 5 } }]
}
],
"text_fields": [
{ "name": "body", "analyzer": "english", "stored": true, "indexed": true },
{
"name": "title",
"analyzer": "title_prefix",
"search_analyzer": "english",
"stored": true,
"indexed": true
}
],
"keyword_fields": [
{ "name": "lang", "stored": true, "indexed": true, "fast": true }
],
"numeric_fields": [{ "name": "year", "i64": true, "fast": true }],
"nested_fields": [
{
"name": "comment",
"fields": [
{
"type": "keyword",
"name": "author",
"stored": true,
"indexed": true,
"fast": true
}
]
}
],
"vector_fields": []
}If you omit analyzers, Searchlite injects the built-in default analyzer (ASCII lowercase + alphanumeric tokenization) and tokenizer stays a supported alias for analyzer. Available tokenizers: default, unicode (NFKC + case-folded words), and whitespace. Token filters include lowercase, stopwords (built-in en or an explicit list), stemmer (English), synonyms (from/to lists expanded at the same position), and edge_ngram (min/max).
stored fields are returned when --return-stored/return_stored is enabled. fast fields are memory-mapped for filters; numeric ranges and keyword predicates are expressed via the JSON filter AST (see examples below). Nested objects are flattened into dotted field names (e.g., comment.author); you can either filter on the dotted path directly or wrap a clause with the Nested filter in the JSON API.
Nested filters are evaluated per object, and stored nested values preserve their original structure while omitting unstored fields.
Every document must include a string primary key under doc_id_field (defaults to _id). Skip listing that id in your text_fields/keyword_fields/numeric_fields; it is stored automatically, returned on hits, and used for upsert/delete semantics.
Set an index location once:
INDEX=/tmp/searchlite_idxUse cargo run -p searchlite-cli -- <command> ... to invoke the CLI. Each command maps to a lifecycle step:
init <index> <schema>: creates a new index directory and writes the schema manifest so the index is ready to accept documents.add <index> <doc.jsonl>: upserts newline-delimited JSON documents (keyed bydoc_id_field) into the writer buffer; changes are not visible to readers until you runcommit.update <index> <doc.jsonl>: alias foraddto emphasize upsert semantics.delete <index> <ids.txt>: queues deletions by id (one id per line, matchingdoc_id_field), applied oncommit.commit <index>: flushes buffered documents, writes new segment files, and updates the manifest so searches can see the newly added data.search <index> [options]: executes a query, returning JSON hits (and optional aggregations) using either CLI flags or a full request payload.inspect <index>: prints the current manifest and segment metadata to help debug index contents and state.compact <index>: merges segments to reduce fragmentation and improve search performance.
Documents without the required id (doc_id_field) will be rejected. Upserts are effective on commit; deletes hide older documents immediately after commit and are dropped on the next compaction.
- Create an index from a schema:
cargo run -p searchlite-cli -- init "$INDEX" schema.json- Add a single document (newline-delimited JSON):
cat > /tmp/one.jsonl <<'EOF'
{"_id":"doc-1","body":"Rust is a systems programming language","lang":"en","year":2024}
EOF
cargo run -p searchlite-cli -- add "$INDEX" /tmp/one.jsonl
cargo run -p searchlite-cli -- commit "$INDEX"- Add multiple documents (uses the included sample):
cargo run -p searchlite-cli -- add "$INDEX" docs.jsonl
cargo run -p searchlite-cli -- commit "$INDEX"- Query the index (structured query + filters, stored fields, snippets):
cat > /tmp/request.json <<'EOF'
{
"query": {
"type": "query_string",
"query": "rust language",
"fields": ["body","title"]
},
"filter": {
"And": [
{ "KeywordEq": { "field": "lang", "value": "en" } },
{ "I64Range": { "field": "year", "min": 2020, "max": 2025 } }
]
},
"limit": 5,
"return_stored": true,
"highlight_field": "body"
}
EOF
cargo run -p searchlite-cli -- search "$INDEX" --request /tmp/request.json- Typo-tolerant search (fuzzy matching):
cat > /tmp/request.json <<'EOF'
{
"query": {
"type": "query_string",
"query": "body:rusk"
},
"fuzzy": { "max_edits": 1, "prefix_length": 1, "max_expansions": 20, "min_length": 3 },
"limit": 5,
"return_stored": true
}
EOF
cargo run -p searchlite-cli -- search "$INDEX" --request /tmp/request.jsonAggregations use Elasticsearch-style JSON and require fast fields on the target keyword/numeric columns. Results are emitted as a single JSON blob containing hits and aggregations.
cat > /tmp/aggs.json <<'EOF'
{
"langs": { "type": "terms", "field": "lang", "size": 5 },
"views_stats": { "type": "stats", "field": "year" }
}
EOF
cargo run -p searchlite-cli -- search \
--index "$INDEX" \
--q "rust" \
--limit 0 \
--aggs-file /tmp/aggs.jsonIf you prefer inline JSON, pass --aggs '{"langs":{"type":"terms","field":"lang"}}'.
- Provide a
sortarray in the search request (or via--sort "field:order,other_field:asc"in the CLI). Each entry looks like{"field":"year","order":"desc"};_scoreis also allowed. - Sort targets must be fast keyword or numeric fields; the default order is ascending (descending for
_score). - Multi-valued fields use the minimum value for ascending sorts and the maximum for descending sorts; documents missing the field are placed last.
- Ordering is stable and tiebroken by segment/doc id so cursor pagination works reliably.
This HTTP service provides no authentication, authorization, or rate limiting. Do not expose it directly to untrusted networks; front it with your own proxy or API gateway that enforces access control and rate limits.
Run the bundled HTTP server for a single index (available directly from the CLI):
searchlite http --index /tmp/searchlite_idx --bind 0.0.0.0:8080
# Or via cargo without installing first:
cargo run -p searchlite-cli -- http --index /tmp/searchlite_idx --bind 0.0.0.0:8080
# Env-style config is supported too:
SEARCHLITE_INDEX_PATH=/tmp/searchlite_idx \
SEARCHLITE_BIND_ADDR=0.0.0.0:8080 \
SEARCHLITE_MAX_BODY_BYTES=$((50*1024*1024)) \
cargo run -p searchlite-cli -- http --refresh-on-commitThe standalone searchlite-http binary remains available; both entrypoints share the same flags.
Flags/env:
--index/SEARCHLITE_INDEX_PATH: directory for the single index served by this instance.--bind/SEARCHLITE_BIND_ADDR: listen address (default127.0.0.1:8080).--require-existing-index: fail fast at startup if the manifest is missing.--max-body-bytes,--max-concurrency,--request-timeout-secs,--shutdown-grace-secs,--refresh-on-commit: resource limits and shutdown behavior.
All errors return {"error":{"type":"...","reason":"..."}}. No auth or rate limiting is provided; front it with your own proxy. Writes issued via /add, /bulk, or /delete are queued in the writer and become durable/searchable only after calling /commit (optionally followed by /refresh depending on your staleness needs). The full API surface is documented in openapi.yaml.
- Init (fails if the index already exists):
curl -XPOST http://localhost:8080/init \
-H 'Content-Type: application/json' \
--data-binary @schema.json- Stream writes:
curl -XPOST http://localhost:8080/add \
-H 'Content-Type: application/x-ndjson' \
--data-binary @docs.ndjson
curl -XPOST http://localhost:8080/commit- JSON bulk ingest:
curl -XPOST http://localhost:8080/bulk \
-H 'Content-Type: application/json' \
-d '{"docs":[{"_id":"1","body":"Rust search"},{"_id":"2","body":"More docs"}]}'- Search + highlight + collapse + aggregations:
cat > /tmp/search.json <<'EOF'
{
"query": { "type": "query_string", "query": "rust" },
"limit": 5,
"return_stored": true,
"highlight_field": "body",
"collapse": { "field": "lang" },
"aggs": { "langs": { "type": "terms", "field": "lang", "size": 5 } },
"suggest": {
"complete": { "type": "completion", "field": "body", "prefix": "ru", "size": 3 }
}
}
EOF
curl -XPOST http://localhost:8080/search \
-H 'Content-Type: application/json' \
--data-binary @/tmp/search.json- Vector-only query (when built with
--features vectors):
curl -XPOST http://localhost:8080/search \
-H 'Content-Type: application/json' \
-d '{"query":{"type":"vector","field":"embedding","vector":[1.0,0.0],"k":5,"alpha":0.0},"limit":5,"return_stored":true}'- Multiple vector clauses (blends candidates across fields/queries):
curl -XPOST http://localhost:8080/search \
-H 'Content-Type: application/json' \
-d '{"query":{"type":"bool","should":[{"type":"vector","field":"vec_a","vector":[1,0,0],"alpha":0.0,"k":20},{"type":"vector","field":"vec_b","vector":[0,1,0],"alpha":0.0,"k":20}]},"candidate_size":100,"limit":20,"return_stored":true}'-
Numeric boosts:
{"type":"rank_feature","field":"popularity","modifier":"sqrt"}and a guarded script score:{"type":"script_score","query":{"type":"match_all"},"script":"_score + popularity * weight","params":{"weight":0.1}}. -
Maintenance endpoints:
curl -XPOST http://localhost:8080/refresh # lightweight reader reload
curl -XPOST http://localhost:8080/compact # merge segments
curl -XGET http://localhost:8080/inspect # manifest + segments
curl -XGET http://localhost:8080/stats # doc/segment counts- Field requirements:
terms/significant_terms/rare_termsneed a fast keyword field;range/histogram/date_histogram/percentiles/percentile_ranksneed fast numeric fields (date histograms accept numeric millis or RFC3339 strings stored as fast numeric);cardinalityworks with fast keyword or numeric fields;top_hitshas no field requirement but returns stored fields/snippets when enabled. - Metric semantics:
stats/extended_statsaggregate over all field values; multi-valued fields contribute each entry (bucketdoc_countstays per-document whilecountis per-value).value_countcounts values (plusmissingfills) rather than documents-with-values.cardinalityhashes values (exact for small sets) with optionalprecision_thresholdandmissing;percentilesdefault to[1,5,25,50,75,95,99]unlesspercentsis provided and use a bounded t-digest estimator (exact mode for small buckets);percentile_ranksreport the percent of values at or below each suppliedvaluesentry. - Bucket options:
termssupportssize,shard_size,min_doc_count, and nestedaggs;significant_termsaddsbackground_filter+size/min_doc_count;rare_termsfavors low-frequency keys withmax_doc_count;range/date_rangeacceptkey,from,to,keyed;histogramsupportsinterval,offset,min_doc_count,extended_bounds,hard_bounds,missing;date_histogramsupportscalendar_interval(day/week/month/quarter/year) orfixed_interval(e.g.,1d,12h), optionaloffset,min_doc_count,extended_bounds,hard_bounds,missing;filterwraps any filter AST node into a single bucket with sub-aggs;compositepaginates deterministic buckets across multiple sources (terms/histogram) and returnsafter_keyfor the next page. Bucket aggs accept optionalsamplingwith eithersizeorprobabilityplus a deterministicseed. - Pipeline options:
bucket_sortreorders/truncates buckets viasort/from/size;avg_bucketandsum_bucketread abuckets_path(e.g.,"histogram.stats.avg") from the parent bucket tree;derivativeandmoving_avgoperate on histogram/date_histogram buckets withgap_policy(skip/insert_zeros) and optionalunit/predict;bucket_scriptevaluates a lightweight arithmetic expression over one or morebuckets_pathvalues (including_countfor bucket doc counts). - Top hits:
{"type":"top_hits","size":N,"from":M,"fields":["field1",...],"highlight_field":"body"}returns sorted hits per bucket withtotaland optional snippets. - Aggregations run over all matched documents (not just top-k); when
--limit 0the search skips hit ranking and only returnsaggregations(cursors are not supported with--limit 0). Aggregations withsamplingreturn approximate counts and mark responses withsampled: true.
cat > /tmp/aggs-advanced.json <<'EOF'
{
"rust_only": {
"type": "filter",
"filter": { "KeywordEq": { "field": "lang", "value": "rust" } },
"aggs": { "tags": { "type": "terms", "field": "tag", "size": 3 } }
},
"by_lang_year": {
"type": "composite",
"size": 2,
"sources": [
{ "type": "terms", "name": "lang", "field": "lang" },
{ "type": "histogram", "name": "year", "field": "year", "interval": 5 }
],
"aggs": {
"latency_p95": { "type": "percentiles", "field": "latency_ms", "percents": [50, 95] },
"bucket_order": { "type": "bucket_sort", "sort": [ { "_count": "desc" } ], "size": 2 }
}
},
"by_tag": {
"type": "terms",
"field": "tag",
"aggs": {
"score_stats": { "type": "stats", "field": "score" },
"sorted": { "type": "bucket_sort", "sort": [ { "score_stats.avg": "desc" } ], "size": 3 },
"avg_scores": { "type": "avg_bucket", "buckets_path": "score_stats.avg" }
}
},
"unique_authors": { "type": "cardinality", "field": "author" },
"slow_ranks": { "type": "percentile_ranks", "field": "latency_ms", "values": [100, 500] }
}
EOF
cargo run -p searchlite-cli -- search "$INDEX" --q "rust" --limit 0 --aggs-file /tmp/aggs-advanced.jsoncomposite returns an after_key so you can page deterministically across buckets by sending that object back as after in the next request. Pipeline aggregations operate on the current bucket tree: bucket_sort reorders/truncates buckets, while avg_bucket/sum_bucket read metrics via dot-separated buckets_path selectors.
cat > /tmp/aggs-sampling.json <<'EOF'
{
"sig_tags": {
"type": "significant_terms",
"field": "tag",
"size": 5,
"background_filter": { "KeywordEq": { "field": "lang", "value": "en" } }
},
"rare_langs": {
"type": "rare_terms",
"field": "lang",
"max_doc_count": 1,
"sampling": { "probability": 0.35, "seed": 42 }
},
"by_day": {
"type": "date_histogram",
"field": "timestamp_ms",
"fixed_interval": "1d",
"sampling": { "size": 5000, "seed": 7 },
"aggs": {
"latency": { "type": "stats", "field": "latency_ms" },
"trend": { "type": "derivative", "buckets_path": "latency.avg", "gap_policy": "skip", "unit": 86400000 },
"smooth": { "type": "moving_avg", "buckets_path": "latency.avg", "window": 3, "predict": 1 },
"efficiency": {
"type": "bucket_script",
"buckets_path": { "avg": "latency.avg", "trend": "trend.value" },
"script": "avg / (trend + 1)"
}
}
}
}
EOF
cargo run -p searchlite-cli -- search "$INDEX" --q "rust" --limit 0 --aggs-file /tmp/aggs-sampling.jsonResponses include sampled: true when sampling is active; counts/percentiles become approximate while ordering remains deterministic for a fixed seed. Percentiles and percentile_ranks use a bounded t-digest estimator and switch to exact calculations for small buckets.
Collapse results by a fast keyword field to return one top hit per group, with optional inner_hits to see more from each group. The overall hit count still reflects all matching documents; responses also include total_groups.
{
"query": "rust systems",
"sort": [
{ "field": "published_at", "order": "desc" },
{ "field": "_score", "order": "desc" }
],
"collapse": {
"field": "author",
"inner_hits": {
"size": 3,
"from": 0,
"sort": [{ "field": "_score", "order": "desc" }]
}
},
"limit": 10,
"return_stored": true
}The first hit per group follows the request sort and is stable within the group. inner_hits return additional hits per group (sorted independently if you supply sort on inner_hits).
Multi-field highlighting is configurable per field with custom tags and fragment sizes. Hits include a highlights map when configured, while the legacy highlight_field still emits a single snippet for backward compatibility.
{
"query": "rust systems",
"highlight": {
"fields": {
"body": {
"pre_tag": "<em>",
"post_tag": "</em>",
"fragment_size": 120,
"number_of_fragments": 2
},
"title": {
"pre_tag": "<b>",
"post_tag": "</b>",
"fragment_size": 60,
"number_of_fragments": 1
}
}
},
"return_stored": true
}Highlighting uses the field's search analyzer (including synonyms and edge n-grams) to find matches, is phrase-aware, and centers fragments around the first match.
The query_string node (and legacy string queries) supports field:term, phrases in quotes ("field:exact phrase"), and negation with a leading -term.
- You can also pass the full search payload as JSON (same shape used by the upcoming HTTP service):
cat > /tmp/search_request.json <<'EOF'
{
"query": {
"type": "query_string",
"query": "rust language",
"fields": null
},
"filter": {
"And": [
{ "KeywordEq": { "field": "lang", "value": "en" } },
{ "I64Range": { "field": "year", "min": 2020, "max": 2025 } }
]
},
"limit": 5,
"sort": [
{ "field": "year", "order": "desc" }
],
"execution": "wand",
"bmw_block_size": null,
"return_stored": true,
"highlight_field": "body"
}
EOF
cargo run -p searchlite-cli -- search "$INDEX" --request /tmp/search_request.jsonUse --request-stdin to read the payload from standard input. When a JSON request is supplied, individual CLI flags (like --q, --filter, etc.) are ignored.
Legacy support: query may still be a string; structured query nodes remain the preferred shape.
Beyond the basics (query/filter/sort/limit), the request supports:
aggs: map of aggregation specs. New options include filter buckets ({"type":"filter","filter":{...},"aggs":{...}}), composite buckets ({"type":"composite","sources":[{"type":"terms","name":"lang","field":"lang"},{"type":"histogram","name":"year","field":"year","interval":5}],"size":10,"after":{...},"aggs":{...}}), and metric/pipeline aggs (cardinality,percentiles,percentile_ranks,bucket_sort,avg_bucket,sum_bucket). Pipelinebuckets_pathvalues walk the parent bucket tree with dot-separated paths like"by_tag.score_stats.avg".collapse:{ "field": "author", "inner_hits": { "size": 3, "from": 0, "sort": [{ "field": "_score", "order": "desc" }] } }groups results by a fast keyword field and keeps the top hit per group; responses includetotal_groupsplus optionalinner_hitsper group.highlight:{ "fields": { "body": { "pre_tag": "<em>", "post_tag": "</em>", "fragment_size": 160, "number_of_fragments": 2 } } }adds ahighlightsmap to hits. The legacyhighlight_fieldstring still returns a single snippet when you only need a default highlight.- A reference JSON Schema for search requests lives at
search-request.schema.jsonin the repo root to help clients validate payloads.
The structured query AST now supports dictionary-driven expansion without changing analyzer behavior:
{ "query": { "type": "prefix", "field": "title", "value": "rus", "max_expansions": 50 } }
{ "query": { "type": "wildcard", "field": "title", "value": "r*st", "max_expansions": 100 } }
{ "query": { "type": "regex", "field": "title", "value": "r(ust|uby)", "max_expansions": 100 } }Each node analyzes the input with the field's search analyzer, expands against the segment term dictionary (capped by max_expansions per segment), and ORs the resulting terms for scoring. boost is accepted on each node to influence scoring weight.
Search requests can include an optional suggest map for completion-style term suggestions:
{
"query": { "type": "match_all" },
"limit": 0,
"return_stored": false,
"suggest": {
"title_suggest": {
"type": "completion",
"field": "title",
"prefix": "ru",
"size": 5,
"fuzzy": {
"max_edits": 1,
"prefix_length": 1,
"max_expansions": 20,
"min_length": 2
}
}
}
}The response embeds deterministic suggestions keyed by name:
"suggest": {
"title_suggest": {
"options": [
{ "text": "rust", "score": 42.0, "doc_freq": 3 },
{ "text": "ruby", "score": 21.0, "doc_freq": 1 }
]
}
}Text fields can opt into automatic edge n-gram indexing with search_as_you_type while keeping the search analyzer unchanged:
{
"name": "title",
"analyzer": "default",
"stored": true,
"indexed": true,
"search_as_you_type": { "min_gram": 1, "max_gram": 10 }
}Queries over that field (including query_string and prefix) will match partial prefixes such as "ru" against "rustacean". You can also roll your own by defining an analyzer with an edge_ngram filter and wiring it as the index analyzer while keeping a normal search analyzer.
multi_matchsupportsbest_fields(max score with optionaltie_breaker),most_fields(sum across fields), andcross_fields(treat fields as one blended field).fieldsacceptsFieldSpecentries so you can boost fields without string parsing (e.g.,{"field":"title","boost":2.0}), andminimum_should_matchaccepts counts or percentages.dis_maxpicks the best-scoring child query and blends the rest viatie_breaker.phrasenow acceptsslop(positions of wiggle room) for near-match phrases.
Example multi-field query with boosts:
{
"query": {
"type": "multi_match",
"query": "rust search",
"match_type": "best_fields",
"fields": [{ "field": "title", "boost": 2.0 }, { "field": "body" }],
"operator": "or",
"tie_breaker": 0.2,
"minimum_should_match": "75%"
},
"limit": 5
}Blend multiple queries with dis_max:
{
"query": {
"type": "dis_max",
"tie_breaker": 0.4,
"queries": [
{ "type": "term", "field": "title", "value": "rust" },
{ "type": "term", "field": "body", "value": "rust" }
]
}
}Phrase slop example (allows one gap between terms):
{
"query": {
"type": "phrase",
"field": "body",
"terms": ["rust", "search"],
"slop": 1
}
}constant_scorewraps a filter-only query and returns a fixed score (default 1.0) when the filter matches.function_scorelets you blend deterministic functions with the base BM25 score. Functions can be filtered, combined withscore_mode, and merged with the base score viaboost_mode.decayfunctions (exp/gauss/linear) are available insidefunction_scorefor distance-based scoring on numeric fast fields; configureorigin,scale, optionaloffset, anddecay(default 0.5).
{
"query": {
"type": "function_score",
"query": { "type": "match_all" },
"functions": [
{
"type": "weight",
"weight": 2.0,
"filter": { "KeywordEq": { "field": "lang", "value": "en" } }
},
{
"type": "decay",
"field": "age_days",
"origin": 0,
"scale": 30,
"offset": 0,
"decay": 0.5,
"function": "linear"
},
{
"type": "field_value_factor",
"field": "popularity",
"factor": 0.25,
"modifier": "log1p",
"missing": 0.0
}
],
"score_mode": "sum",
"boost_mode": "sum",
"max_boost": 5.0,
"min_score": 0.5
},
"limit": 5
}Constant score example:
{
"query": {
"type": "constant_score",
"filter": { "KeywordEq": { "field": "lang", "value": "en" } },
"boost": 2.5
}
}Rescore the top window after the initial rank (ordering outside the window is unchanged):
{
"query": { "type": "query_string", "query": "rust search" },
"limit": 10,
"rescore": {
"window_size": 50,
"query": {
"type": "phrase",
"field": "body",
"terms": ["rust", "search"],
"slop": 1
},
"score_mode": "total"
}
}Debugging aids:
explain: truereturns per-hit score breakdowns, including function contributions and any rescore adjustments.profile: trueattaches execution stats (candidates_examined,scored_docs, postings advances) and timing buckets (search_ms,rescore_ms).- Both flags are off by default to avoid overhead.
Filters operate on fast fields (fast: true in the schema). Keyword filters are case-insensitive; numeric ranges are inclusive. Nested filters bind to the same nested object (parent/child lineage).
{
"filter": { "KeywordEq": { "field": "lang", "value": "en" } }
}{
"filter": { "KeywordIn": { "field": "lang", "values": ["en", "fr"] } }
}{
"filter": {
"And": [
{ "I64Range": { "field": "year", "min": 2018, "max": 2024 } },
{ "F64Range": { "field": "score", "min": 0.25, "max": 0.9 } }
]
}
}If your document has tags: ["rust", "search"] and tags is a fast keyword field:
{
"filter": { "KeywordEq": { "field": "tags", "value": "rust" } }
}Any value in the multi-valued column can satisfy the clause.
Schema excerpt:
{
"nested_fields": [
{
"name": "comment",
"fields": [
{
"type": "keyword",
"name": "author",
"fast": true,
"stored": true,
"indexed": true
},
{
"type": "keyword",
"name": "tag",
"fast": true,
"stored": true,
"indexed": true
}
]
}
]
}Filter: match documents with any comment whose author is alice and tag is rust:
{
"filter": {
"And": [
{
"Nested": {
"path": "comment",
"filter": {
"KeywordEq": { "field": "author", "value": "alice" }
}
}
},
{
"Nested": {
"path": "comment",
"filter": {
"KeywordEq": { "field": "tag", "value": "rust" }
}
}
}
]
}
}Because nested filters are scoped to the same object, this only passes when a single comment object has both author=alice and tag=rust.
Schema excerpt:
{
"nested_fields": [
{
"name": "comment",
"fields": [
{
"type": "keyword",
"name": "author",
"fast": true,
"stored": true,
"indexed": true
},
{
"type": "object",
"name": "reply",
"fields": [
{
"type": "keyword",
"name": "tag",
"fast": true,
"stored": true,
"indexed": true
}
]
}
]
}
]
}Filter: require a comment with author=bob that has a reply tagged y (parent-child binding is enforced):
{
"filter": {
"And": [
{
"Nested": {
"path": "comment",
"filter": {
"KeywordEq": { "field": "author", "value": "bob" }
}
}
},
{
"Nested": {
"path": "comment",
"filter": {
"Nested": {
"path": "reply",
"filter": {
"KeywordEq": { "field": "tag", "value": "y" }
}
}
}
}
}
]
}
}The inner Nested is evaluated only against replies belonging to the same comment object that satisfies the outer Nested.
Filter on nested numeric properties alongside keywords:
{
"filter": {
"And": [
{
"Nested": {
"path": "review",
"filter": { "KeywordEq": { "field": "user", "value": "alice" } }
}
},
{
"Nested": {
"path": "review",
"filter": { "I64Range": { "field": "rating", "min": 5, "max": 8 } }
}
},
{
"Nested": {
"path": "review",
"filter": { "F64Range": { "field": "score", "min": 0.7, "max": 0.8 } }
}
}
]
}
}All three clauses must match the same review object.
Combine a top-level numeric range with nested filters:
{
"filter": {
"And": [
{ "I64Range": { "field": "year", "min": 2020, "max": 2025 } },
{
"Nested": {
"path": "comment",
"filter": {
"KeywordEq": { "field": "author", "value": "alice" }
}
}
}
]
}
}-
Mark filterable fields with
"fast": truein the schema. -
For nested filters, wrap child clauses in
Nestedblocks; use additional nested blocks for deeper levels. -
Stored nested fields preserve structure; unstored fields are omitted in responses.
-
Inspect or compact:
cargo run -p searchlite-cli -- inspect "$INDEX"
cargo run -p searchlite-cli -- compact "$INDEX"use searchlite_core::api::{
builder::IndexBuilder, Index, Filter,
types::{
Aggregation, Document, ExecutionStrategy, IndexOptions, KeywordField, NumericField, Schema,
QueryNode, SearchRequest, SortOrder, SortSpec, StorageType,
},
};
use std::{collections::BTreeMap, path::PathBuf};
let path = PathBuf::from("./example_idx");
let mut schema = Schema::default_text_body();
schema.keyword_fields.push(KeywordField { name: "lang".into(), stored: true, indexed: true, fast: true });
schema.numeric_fields.push(NumericField { name: "year".into(), i64: true, fast: true, stored: true });
let opts = IndexOptions {
path: path.clone(),
create_if_missing: true,
enable_positions: true,
bm25_k1: 0.9,
bm25_b: 0.4,
storage: StorageType::Filesystem,
#[cfg(feature = "vectors")]
vector_defaults: None,
};
// Create or open the index.
let idx = IndexBuilder::create(&path, schema, opts.clone())?;
// Insert one document.
let mut writer = idx.writer()?;
let doc = Document { fields: [
("_id".to_string(), serde_json::json!("doc-1")),
("body".to_string(), serde_json::json!("Rust is fast and reliable")),
("lang".to_string(), serde_json::json!("en")),
("year".to_string(), serde_json::json!(2024)),
].into_iter().collect() };
writer.add_document(&doc)?;
// Insert multiple documents in one batch.
let more_docs = vec![
Document { fields: [("_id".to_string(), serde_json::json!("doc-2")), ("body".to_string(), serde_json::json!("SQLite vibes for search")), ("lang".to_string(), serde_json::json!("en")), ("year".to_string(), serde_json::json!(2023))].into_iter().collect() },
Document { fields: [("_id".to_string(), serde_json::json!("doc-3")), ("body".to_string(), serde_json::json!("Embedded search engine demo")), ("lang".to_string(), serde_json::json!("en")), ("year".to_string(), serde_json::json!(2022))].into_iter().collect() },
];
for d in more_docs.iter() {
writer.add_document(d)?;
}
writer.commit()?; // Flush WAL into a segment
// Search the index.
let reader = idx.reader()?;
let results = reader.search(&SearchRequest {
query: QueryNode::QueryString {
query: "rust engine".into(),
fields: None,
boost: None,
}
.into(),
fields: None,
filter: Some(Filter::I64Range { field: "year".into(), min: 2020, max: 2025 }),
limit: 5,
sort: vec![SortSpec { field: "year".into(), order: Some(SortOrder::Desc) }],
cursor: None,
execution: ExecutionStrategy::Wand,
bmw_block_size: None,
fuzzy: None,
return_stored: true,
highlight_field: Some("body".into()),
aggs: [(
"langs".to_string(),
Aggregation::Terms(Box::new(searchlite_core::api::types::TermsAggregation {
field: "lang".into(),
size: Some(3),
shard_size: None,
min_doc_count: None,
missing: None,
aggs: BTreeMap::new(),
})),
)]
.into_iter()
.collect(),
#[cfg(feature = "vectors")]
vector_query: None,
#[cfg(feature = "vectors")]
vector_filter: None,
})?;
for hit in results.hits {
println!("doc {} score {:.3} fields {:?}", hit.doc_id, hit.score, hit.fields);
}Search responses include a next_cursor when additional hits remain.
- JSON/SDK: send that value in the
cursorfield to fetch the next page without computing offsets. - CLI:
cargo run -p searchlite-cli -- search "$INDEX" --q "rust" --limit 5 --cursor "$NEXT_CURSOR". - FFI: pass the cursor string to the
cursorargument. - Cursors are opaque and bounded (up to ~50k returned hits) to avoid unbounded memory use; very deep pagination returns an error instead of over-consuming resources.
Index::open(opts) opens an existing index; Index::compact() rewrites all segments into one. WAL-backed writers queue documents until commit is called; rollback drops uncommitted changes.
execution: choose"bm25"(full evaluation),"wand"(exact WAND pruning), or"bmw"(block-max WAND). Default iswand.bmw_block_size: optional block size when using BMW pruning.
The CLI exposes --execution and --bmw-block-size on search. A small synthetic benchmark that compares the strategies lives in searchlite-core/examples/pruning.rs (cargo run -p searchlite-core --example pruning).
- Define vector fields in the schema (requires the
vectorsfeature):
{
"vector_fields": [{ "name": "embedding", "dim": 2, "metric": "Cosine" }]
}- Vector-only: set the query node to a vector clause. Missing vectors are skipped.
{
"query": {
"type": "vector",
"field": "embedding",
"vector": [1.0, 0.0],
"k": 3,
"alpha": 0.0
}
}- Hybrid: combine a text query with a legacy
vector_querytuple to blend BM25 and vector similarity.alpha=1.0uses BM25 only;alpha=0.0uses vector similarity only.
{
"query": {
"type": "query_string",
"query": "rust",
"boost": null,
"fields": null
},
"limit": 5,
"vector_query": ["embedding", [1.0, 0.0], 0.25]
}-
vector_queryalso accepts an object form to tune ANN parameters:{ "field": "embedding", "vector": [1.0, 0.0], "alpha": 0.25, "k": 20, "candidate_size": 40, "ef_search": 40 }. The tuple form still works for compatibility. -
Per-field HNSW settings can be set on
vector_fieldsviahnsw: { "m": 16, "ef_construction": 64 }; defaults are used when omitted. -
Tunables (vector queries):
k: number of neighbors to retrieve (defaults tolimit).candidate_size: optional oversampling during ANN search.ef_search: ANN beam width (defaults to a sensible value if omitted).vector_filter: optional filter applied during vector candidate selection.
-
Responses include
vector_scorewhen vector search runs;_scoreis the blended score. -
Cosine vectors are normalized automatically; vectors with the wrong dimension are rejected.
-
ANN is approximate (HNSW); raise
candidate_size/ef_searchfor higher recall.
For ephemeral or test-heavy scenarios, set storage: StorageType::InMemory in IndexOptions. The API and search behavior stay the same, but no files are created on disk. (The CLI currently uses filesystem storage only.)
- Install
wasm-pack(e.g.,brew install wasm-packorcargo install wasm-pack) before building. - Threaded wasm needs atomics/bulk-memory and build-std;
searchlite-wasm/rust-toolchain.tomlpins a nightly withrust-srcand the wasm target, andsearchlite-wasm/.cargo/config.tomlsets the required rustflags/build-std. Rustup will fetch the nightly toolchain automatically when you build this crate. - Default build for browsers and module workers (ESM):
wasm-pack build searchlite-wasm --target web --release. - Classic workers /
importScriptsneed a separate build:wasm-pack build searchlite-wasm --target no-modules --release(or--target bundlerif you want a bundler to wrap it). - Threaded build (requires COOP/COEP + SharedArrayBuffer):
wasm-pack build searchlite-wasm --target web --release -- --features threads.
- Browser window + module workers: use the
--target webbuild; this is a single build that works in both environments. - Classic web worker / service worker (no modules): use
--target no-modules(or--target bundler) becauseimportScriptscannot load ES modules. - Threads: build with
--features threadsand serve with COOP/COEP headers; this is a separate build and is not available in service workers.
- For the default (non-threaded) demo build, serve the
searchlite-wasmcrate directory over HTTP (any static file server without special headers is fine, e.g.,cd searchlite-wasm && npx http-server -c-1 --cors -p 8080). - For the threaded build (
--features threads), serve with COOP/COEP headers soSharedArrayBufferworks (e.g.,cd searchlite-wasm && npx http-server -c-1 --cors -p 8080 -H "Cross-Origin-Opener-Policy: same-origin" -H "Cross-Origin-Embedder-Policy: require-corp"). - Open
http://localhost:8080/index.html. The bundled page importspkg/searchlite_wasm.js, initializes the module, and provides a lightweight schema/upload/search demo in the browser.
- Instantiate from JS with
await Searchlite.init("demo-db", JSON.stringify(schema), "indexeddb")(default) or"memory"for ephemeral indexes.initreopens existing indexes with the same name and validates schemas; mismatches return an error. - Prefer
add_documents([...])for bulk ingest and callcommit()to flush everything to the manifest.
- Ingest APIs now queue documents; call
commit(searchlite_commitin FFI,commit()in WASM) to make them searchable. - Searches default to
return_stored: false; set it explicitly (or passtrueas the third argument to the WASMsearchhelper) when you need stored fields. - Use the full request helpers for advanced queries:
searchlite_search_request(FFI) andsearch_request_value/search_request(WASM). - See
docs/bindings.mdfor a quick reference of binding behaviors.
Build the FFI crate to generate a shared library and header for C or other language bindings.
# Release build (macOS dylib, Linux .so, Windows .dll + Rust rlib)
cargo build -p searchlite-ffi --release --features ffi
# Enable optional capabilities on the library:
# cargo build -p searchlite-ffi --release --features "ffi,vectors,zstd"Artifacts land in target/release (e.g., libsearchlite_ffi.dylib or libsearchlite_ffi.so) and the C header is at searchlite-ffi/searchlite.h.
vectors: store/query vector fields; search requests can blend BM25 with vector similarity.gpu: stub GPU reranker hooks.zstd: compress stored fields.ffi: build the C FFI surface (searchlite-fficrate, also exposed on the CLI).