dimamik · dimamik · Feb 5, 2026 · Feb 4, 2026 · Feb 4, 2026 · Feb 4, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,3 +1,32 @@
+# v0.6.0
+
+## New 🔥
+
+**BM25 Full-Text Search** is now available via the new `Torus.bm25/5` macro!
+
+[BM25](https://en.wikipedia.org/wiki/Okapi_BM25) is a modern ranking algorithm that generally provides superior relevance scoring compared to traditional TF-IDF (used by `full_text/5`). This integration uses the [pg_textsearch](https://github.com/timescale/pg_textsearch) extension by Timescale.
+
+Key features:
+
+- State-of-the-art BM25 ranking with configurable index parameters (k1, b)
+- Blazingly fast top-k queries via Block-Max WAND optimization (`Torus.bm25/5` + `limit`)
+- Simple syntax: `Post |> Torus.bm25([p], p.body, "search term") |> limit(10)`
+- Score selection with `:score_key` and post-filtering with `:score_threshold`
+- Language/stemming configured at index creation via `text_config`
+
+Requirements:
+
+- PostgreSQL 17+
+- pg_textsearch extension installed
+- BM25 index on the search column (with `text_config` for language)
+
+See the [BM25 Search Guide](https://dimamik.com/posts/bm25_search) for detailed setup instructions and examples.
+
+**When to use BM25 vs full_text:**
+
+- Use `bm25/5` for fast single-column search with modern relevance ranking
+- Use `full_text/5` for multi-column search with weights or when using stored tsvector columns
+
 # v0.5.3
 
 ## Fixes

diff --git a/README.md b/README.md
@@ -38,7 +38,7 @@ Post
 
 See [`full_text/5`](https://hexdocs.pm/torus/Torus.html#full_text/5) for more details.
 
-## 6 types of search:
+## 7 types of search:
 
 1. **Pattern matching**: Searches for a specific pattern in a string.
 
@@ -58,12 +58,13 @@ See [`full_text/5`](https://hexdocs.pm/torus/Torus.html#full_text/5) for more de
 1. **Similarity:** Searches for records that closely match the input text using trigram distance.
 
    ```elixir
-   iex> insert_posts!(["Hogwarts Secrets", "Quidditch Fever", "Hogwart’s Secret"])
-   ...> Post
-   ...> |> Torus.similarity([p], [p.title], "hoggwarrds")
-   ...> |> limit(2)
-   ...> |> select([p], p.title)
-   ...> |> Repo.all()
+   insert_posts!(["Hogwarts Secrets", "Quidditch Fever", "Hogwart’s Secret"])
+
+   Post
+   |> Torus.similarity([p], [p.title], "hoggwarrds")
+   |> limit(2)
+   |> select([p], p.title)
+   |> Repo.all()
    ["Hogwarts Secrets", "Hogwart’s Secret"]
    ```
 
@@ -74,20 +75,39 @@ See [`full_text/5`](https://hexdocs.pm/torus/Torus.html#full_text/5) for more de
 1. **Full text**: Uses term-document matrix vectors for, enabling efficient querying and ranking based on term frequency. Supports prefix search and is great for large datasets to quickly return relevant results. See [PostgreSQL Full Text Search](https://www.postgresql.org/docs/current/textsearch.html) for internal implementation details.
 
    ```elixir
-   iex> insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
-   ...> insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
-   ...> insert_post!(title: "Completely unrelated", body: "No magic here!")
-   ...> Post
-   ...> |> Torus.full_text([p], [p.title, p.body], "uncov hogwar")
-   ...> |> select([p], p.title)
-   ...> |> Repo.all()
+   insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
+   insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
+   insert_post!(title: "Completely unrelated", body: "No magic here!")
+
+   Post
+   |> Torus.full_text([p], [p.title, p.body], "uncov hogwar")
+   |> select([p], p.title)
+   |> Repo.all()
    ["Diagon Bombshell"]
    ```
 
-   Use it when you don’t care about spelling, the documents are long, or if you need to order the results by rank.
+   Use it when you don't care about spelling, the documents are long, you need multi-column search with weights, or if you need to order the results by rank.
 
    See [`full_text/5`](https://hexdocs.pm/torus/Torus.html#full_text/5) for more details.
 
+1. **BM25 full text**: Modern BM25 ranking algorithm for superior relevance scoring using the [pg_textsearch](https://github.com/timescale/pg_textsearch) extension. BM25 generally provides better ranking than traditional built-in TF-IDF full text search and is optimized for top-k queries.
+
+   ```elixir
+   insert_post!(title: "Hogwarts Shocker", body: "A spell disrupts the Quidditch Cup.")
+   insert_post!(title: "Diagon Bombshell", body: "Secrets uncovered in the heart of Hogwarts.")
+   insert_post!(title: "Completely unrelated", body: "No magic here!")
+
+   Post
+   |> Torus.bm25([p], p.body, "secrets hogwarts")
+   |> select([p], p.title)
+   |> Repo.all()
+   ["Diagon Bombshell"]
+   ```
+
+   Use it when you need state-of-the-art relevance ranking for single-column search, especially with LIMIT clauses. Requires PostgreSQL 17+.
+
+   See [`bm25/5`](https://hexdocs.pm/torus/Torus.html#bm25/5) and the [BM25 Search Guide](https://dimamik.com/posts/bm25_search) for detailed setup instructions and examples.
+
 1. **Semantic Search**: Understands the contextual meaning of queries to match and retrieve related content utilizing natural language processing. Read more about semantic search in [Semantic search with Torus guide](/guides/semantic_search.md).
 
    ```elixir
@@ -131,7 +151,7 @@ Torus offers a few helpers to debug, explain, and analyze your queries before us
 
 ## Torus support
 
-For now, Torus supports pattern match, similarity, full-text, and semantic search, with plans to expand support further. These docs will be updated with more examples on which search type to choose and how to make them more performant (by adding indexes or using specific functions).
+For now, Torus supports pattern match, similarity, full-text (TF-IDF and BM25), and semantic search, with plans to expand support further. These docs will be updated with more examples on which search type to choose and how to make them more performant (by adding indexes or using specific functions).
 
 <!-- MDOC -->
 

diff --git a/lib/torus.ex b/lib/torus.ex
@@ -366,6 +366,157 @@ defmodule Torus do
     Torus.Search.FullText.to_tsquery(column, query_text, opts)
   end
 
+  @doc group: "Full text"
+  @doc """
+  BM25 ranked full-text search using the [pg_textsearch](https://github.com/timescale/pg_textsearch) extension.
+
+  BM25 is a modern ranking function that generally provides better relevance than traditional
+  TF-IDF (used by `full_text/5`). It's particularly effective for top-k queries with LIMIT clauses
+  due to Block-Max WAND optimization.
+
+  For detailed usage examples, performance tips, and migration guide, see the [BM25 Search guide](https://dimamik.com/posts/bm25_search).
+
+  > #### Requirements {: .warning}
+  >
+  > - Requires the `pg_textsearch` extension to be installed
+  > - PostgreSQL 17+ only
+  > - Requires a BM25 index on the search column
+  > - **Single column only** - unlike `full_text/5`, BM25 indexes work on one column at a time
+  > - **Language is set at index creation** - use `text_config` in the index `WITH` clause
+  >
+  > ```elixir
+  > defmodule YourApp.Repo.Migrations.CreatePgTextsearchExtension do
+  >   use Ecto.Migration
+  >
+  >   def change do
+  >     execute "CREATE EXTENSION IF NOT EXISTS pg_textsearch", "DROP EXTENSION IF EXISTS pg_textsearch"
+  >
+  >     # Create BM25 index with language configuration
+  >     execute \"\"\"
+  >     CREATE INDEX posts_body_bm25_idx ON posts
+  >     USING bm25(body) WITH (text_config='english')
+  >     \"\"\", "DROP INDEX posts_body_bm25_idx"
+  >   end
+  > end
+  > ```
+
+  ## Options
+
+    * `:order` - Ordering of results. Note that BM25 returns **negative scores** (lower is better):
+      - `:asc` (default) - orders by score ascending (best matches first)
+      - `:desc` - orders by score descending (worst matches first)
+      - `:none` - no ordering applied
+    * `:index_name` - Explicit index name. Required when using `score_threshold`.
+    * `:score_key` - Atom key to select the BM25 score into the result map.
+      - `:none` (default) - score is not selected
+      - `atom` - selects score as this key (use with `select_merge/3`)
+    * `:score_threshold` - Post-filter results by BM25 score (applied after ORDER BY).
+      Since scores are negative and lower is better, use negative thresholds (e.g., `-3.0`
+      keeps only results with score < -3.0, i.e., scores like -4.0, -5.0 which are better matches).
+      May return fewer results than LIMIT.
+    * `:pre_filter` - Whether to exclude non-matching rows.
+      - `false` (default) - no pre-filtering
+      - `true` - adds a `WHERE score < 0` clause to exclude non-matches
+
+  ## Examples
+
+  Basic search - returns top 10 most relevant posts:
+
+      Post
+      |> Torus.bm25([p], p.body, "database search")
+      |> limit(10)
+      |> select([p], p.body)
+      |> Repo.all()
+
+  With score selection:
+
+      Post
+      |> Torus.bm25([p], p.body, "database", score_key: :relevance)
+      |> limit(5)
+      |> select([p], %{body: p.body})
+      |> Repo.all()
+      # => [%{body: "...", relevance: -2.5}, ...]
+
+  With WHERE clause pre-filtering:
+
+      Post
+      |> where([p], p.category_id == 123)
+      |> Torus.bm25([p], p.body, "database")
+      |> limit(10)
+      |> Repo.all()
+
+  With score threshold (post-filtering, may return fewer than LIMIT, `index_name` is required):
+
+      Post
+      |> Torus.bm25([p], p.body, "database", score_threshold: -5.0, index_name: "posts_body_idx")
+      |> limit(10)
+      |> Repo.all()
+
+  ## When to use `bm25/5` vs `full_text/5`
+
+  **Use `bm25/5` when:**
+  - You need better relevance ranking than TF-IDF
+  - You need faster search with large datasets
+  - You have large result sets with LIMIT (top-k queries)
+  - Single column search is sufficient
+  - You're on PostgreSQL 17+
+
+  **Use `full_text/5` when:**
+  - You need multi-column search with different weights per column
+  - You want to use stored tsvector columns
+  - You're on PostgreSQL < 17
+  - You need the `concat` filter type
+
+  ## Multi-column search workaround
+
+  Since BM25 indexes work on single columns, you can create a generated column:
+
+  ```sql
+  ALTER TABLE posts
+  ADD COLUMN searchable_text TEXT
+  GENERATED ALWAYS AS (title || ' ' || body) STORED;
+
+  CREATE INDEX posts_searchable_bm25_idx
+  ON posts USING bm25(searchable_text)
+  WITH (text_config='english');
+  ```
+
+  Then search the generated column:
+
+  ```elixir
+  Post
+  |> Torus.bm25([p], p.searchable_text, "search term")
+  |> limit(10)
+  |> Repo.all()
+  ```
+
+  ## Index options
+
+  BM25 indexes support these parameters in the `WITH` clause:
+
+  - `text_config` - PostgreSQL text search configuration (required). This determines
+    the language/stemming rules. Available configs: `'english'`, `'french'`, `'german'`,
+    `'simple'` (no stemming), etc. Run `SELECT cfgname FROM pg_ts_config;` to list all.
+  - `k1` - Term frequency saturation (default: 1.2, range: 0.1-10.0)
+  - `b` - Length normalization (default: 0.75, range: 0.0-1.0)
+
+  ```sql
+  CREATE INDEX custom_idx ON documents
+  USING bm25(content)
+  WITH (text_config='english', k1=1.5, b=0.8);
+  ```
+
+  ## Performance tips
+
+  - BM25 is most efficient with `ORDER BY + LIMIT` (enables Block-Max WAND optimization)
+  - For filtered searches, create a separate B-tree index on the filter column
+  - Pre-filtering works best when the filter is selective (<10% of rows)
+  - Post-filtering with `score_threshold` may return fewer results than LIMIT
+  """
+  defmacro bm25(query, bindings, qualifier, term, opts \\ []) do
+    Torus.Search.BM25.bm25(query, bindings, qualifier, term, opts)
+  end
+
   @doc group: "Pattern matching"
   @doc """
   The substring function with three parameters provides extraction of a substring

diff --git a/lib/torus/search/bm25.ex b/lib/torus/search/bm25.ex
@@ -0,0 +1,122 @@
+defmodule Torus.Search.BM25 do
+  @moduledoc false
+  import Torus.Search.Common
+  import Ecto.Query, warn: false
+
+  @order_types ~w[asc desc none]a
+
+  def bm25(query, bindings, qualifier, term, opts \\ []) do
+    order = get_arg!(opts, :order, :asc, @order_types)
+    index_name = Keyword.get(opts, :index_name, nil)
+    pre_filter = get_arg!(opts, :pre_filter, false, [true, false])
+    score_key = Keyword.get(opts, :score_key, :none)
+    score_threshold = Keyword.get(opts, :score_threshold, nil)
+
+    raise_if(
+      score_key != :none and not is_atom(score_key),
+      "The `score_key` option must be an atom or :none."
+    )
+
+    raise_if(
+      score_threshold != nil and index_name == nil,
+      "The `index_name` option is required when using `score_threshold`."
+    )
+
+    # Build the BM25 query fragments
+    # When index_name is provided, use to_bm25query(?, ?) for explicit index specification
+    # Otherwise use bare string literal (?) to let PostgreSQL auto-detect the index
+    {bm25query_fragment, bm25query_params} =
+      if index_name do
+        {
+          "to_bm25query(?, ?)",
+          [term, index_name]
+        }
+      else
+        {
+          "?",
+          [term]
+        }
+      end
+
+    # Score fragment for ordering and selection
+    score_fragment_string = "? <@> #{bm25query_fragment}"
+
+    # Build score fragment AST
+    score_fragment =
+      quote do
+        fragment(
+          unquote(score_fragment_string),
+          unquote(qualifier),
+          unquote_splicing(
+            Enum.map(bm25query_params, fn param ->
+              quote do: ^unquote(param)
+            end)
+          )
+        )
+      end
+
+    # Build order fragment if needed
+    order_fragment =
+      if order != :none do
+        asc_desc = if order == :desc, do: :desc, else: :asc
+
+        quote do
+          [{unquote(asc_desc), unquote(score_fragment)}]
+        end
+      end
+
+    # BM25 scores are negative (lower = better), so "better than threshold" means score < threshold
+    # (e.g., -5.0 is better than -2.0, so threshold -3.0 keeps scores < -3.0 like -4.0, -5.0)
+    threshold_fragment_string = "? <@> #{bm25query_fragment} < ?"
+
+    # Pre-filtering by match (excludes non-matches)
+    # Non-matches have score = 0, matches have score < 0
+    pre_filter_fragment_string = "? <@> #{bm25query_fragment} < 0"
+
+    # Build the query
+    quote do
+      unquote(query)
+      |> apply_if(unquote(pre_filter), fn q ->
+        where(
+          q,
+          [unquote_splicing(bindings)],
+          fragment(
+            unquote(pre_filter_fragment_string),
+            unquote(qualifier),
+            unquote_splicing(
+              Enum.map(bm25query_params, fn param ->
+                quote do: ^unquote(param)
+              end)
+            )
+          )
+        )
+      end)
+      |> apply_if(unquote(score_threshold) != nil, fn q ->
+        where(
+          q,
+          [unquote_splicing(bindings)],
+          fragment(
+            unquote(threshold_fragment_string),
+            unquote(qualifier),
+            unquote_splicing(
+              Enum.map(bm25query_params, fn param ->
+                quote do: ^unquote(param)
+              end)
+            ),
+            ^unquote(score_threshold)
+          )
+        )
+      end)
+      |> apply_if(unquote(order) != :none, fn q ->
+        order_by(q, [unquote_splicing(bindings)], unquote(order_fragment))
+      end)
+      |> apply_if(unquote(score_key) != :none, fn q ->
+        select_merge(
+          q,
+          [unquote_splicing(bindings)],
+          %{unquote(score_key) => unquote(score_fragment)}
+        )
+      end)
+    end
+  end
+end