Improve query plan for unique indexes by dwwoelfel · Pull Request #2369 · instantdb/instant

dwwoelfel · 2026-03-13T23:57:17Z

Two things here:

First, we were incorrectly counting some columns that we couldn't join on in the join cost. For example, if we had a query like {age: 10}, where age is typed. Our query will query on triples_extract_number_value(value) = 10. If we were using the ave_with_e_index, then we couldn't count on filtering the v in the join condition. The v in the ave_with_e_index is a json value instead of a number.

Second, we should be more aggressive on using unique indexes. When we model the cost, we calculate the index lookup cost and the filtering cost. If we use the unique columns in the index lookup, then we know that we'll be filtering at most index-cost rows in the filter.

Just in case this causes a regression, the new code is all behind a new-index-cost toggle.

Also adds a bunch more debug info when *debug* is set to true.

coderabbitai · 2026-03-13T23:57:45Z

📝 Walkthrough

Walkthrough

Adds a feature-flagged index cost model and related helpers to query planning: SQL operator mapping, value-access-mode and index applicability checks, a new multiplicative join-cost path, detailed debug cost breakdowns, plus a test verifying index selection with date-based sketches.

Changes

Cohort / File(s)	Summary
Index Cost Model Enhancements `server/src/instant/db/datalog.clj`	Introduced feature-flag-gated index cost model and new public APIs: `op->sql-op`, `use-new-index-cost?`, `value-access-mode`, `can-match-on-index-column?`, `debug-realistic-prefix`, `debug-cost-breakdown`. Refactored path-cost logic to route between old/new implementations and replaced additive join-cost with a multiplicative model; added debug annotations and cost metrics.
Test Coverage `server/test/instant/db/datalog_test.clj`	Added `best-index-ignores-index-columns-it-cant-use` test using `instant.db.attr-sketch` and `java.time.Instant` to validate best-index selection when index columns are inapplicable.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Improve query plan for unique indexes' directly relates to the main objective of the changeset: optimizing query planning for unique indexes and fixing join-cost counting issues.
Description check	✅ Passed	The description clearly explains both key changes: fixing join-cost counting for columns that cannot be joined on, and improving cost modeling for unique indexes. It also mentions the feature-flag safety mechanism.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch better-query-plan

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

server/test/instant/db/datalog_test.clj (1)
49-69: LGTM - Good test coverage for new index selection logic.

The test correctly validates that when an index column can't be used (typed-extract mode with mismatched data types), the planner falls back to triples_pkey. The setup with a date sketch and $gt comparator properly exercises the value-access-mode and can-match-on-index-column? code paths.

Consider adding a brief comment explaining why :triples_pkey is the expected result (e.g., "ave_with_e_index stores value as JSON, not as a typed date, so it can't be used for typed comparisons").
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/test/instant/db/datalog_test.clj` around lines 49 - 69, Add a brief
comment in the test best-index-ignores-index-columns-it-cant-use explaining why
:triples_pkey is expected (e.g., that the :ave index stores values as JSON/not
as typed dates so typed comparisons via
value-access-mode/can-match-on-index-column? fail), referencing the planner path
(d/best-index -> can-match-on-index-column? / value-access-mode) and the named-p
index (:idx [:map {:idx-key :ave}]) so future readers understand the fallback to
:triples_pkey.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1329-1339: The code currently calls op->sql-op on a possibly nil
op before dispatching on func, which will throw if op is nil; move the
op->sql-op call inside the :$comparator branch (where sql-op is actually used)
so that you only call (op->sql-op op) when func is :$comparator; keep the
existing logic that uses (extract-value-fn data-type sql-op) in that branch and
leave other branches (:$not, :$isNull, :$entityIdStartsWith, :raw) unchanged so
they do not evaluate op->sql-op when op is nil.

---

Nitpick comments:
In `@server/test/instant/db/datalog_test.clj`:
- Around line 49-69: Add a brief comment in the test
best-index-ignores-index-columns-it-cant-use explaining why :triples_pkey is
expected (e.g., that the :ave index stores values as JSON/not as typed dates so
typed comparisons via value-access-mode/can-match-on-index-column? fail),
referencing the planner path (d/best-index -> can-match-on-index-column? /
value-access-mode) and the named-p index (:idx [:map {:idx-key :ave}]) so future
readers understand the fallback to :triples_pkey.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cd1d3a6b-a98c-4fec-a4cf-fa1035b5b4e6

📥 Commits

Reviewing files that changed from the base of the PR and between e9cadbe and 918a95e.

📒 Files selected for processing (2)

server/src/instant/db/datalog.clj
server/test/instant/db/datalog_test.clj

server/src/instant/db/datalog.clj

dwwoelfel · 2026-03-14T00:21:27Z

I'm going to mark this as a draft--I might have a better way.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

server/src/instant/db/datalog.clj (1)
1257-1276: ⚠️ Potential issue | 🟠 Major

Keep the fallback join-cost formula additive.

Line 1270 now multiplies :join-remaining, so turning :new-index-cost off no longer restores the previous join-cost behavior. That also disagrees with Lines 1396-1397, which still report the additive variant as join-cost-old.
🐛 Proposed fix
-        join-cost (reduce * 1 (vals (:join-remaining costs)))
+        join-cost (reduce + 1 (vals (:join-remaining costs)))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/src/instant/db/datalog.clj` around lines 1257 - 1276, The fallback
join-cost computation in path-cost-with-joins-old currently multiplies values
from :join-remaining (reduce *), which makes toggling :new-index-cost not revert
to the old additive behavior and conflicts with the reported join-cost-old;
change the join-cost calculation in path-cost-with-joins-old to sum the values
in :join-remaining (use reduce + with initial 0) so it matches the additive
fallback and the join-cost-old reporting elsewhere.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1385-1418: In debug-cost-breakdown, make the reported filter cost
align with the active indexing-mode: either branch the computed :filter-cost on
use-new-index-cost? (so when the flag is false, compute :filter-cost the same
way as when :score uses path-cost-with-joins-old) or emit both :filter-cost-old
and :filter-cost-new fields; update the calculation that currently uses
unique-hit? to produce the alternate value for the old code path (referencing
unique-hit?, index-lookup-cost, unbounded-filter-cost, path-cost-with-joins and
path-cost-with-joins-old) and ensure :score and the chosen filter-cost reflect
the same use-new-index-cost? branch.

---

Outside diff comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1257-1276: The fallback join-cost computation in
path-cost-with-joins-old currently multiplies values from :join-remaining
(reduce *), which makes toggling :new-index-cost not revert to the old additive
behavior and conflicts with the reported join-cost-old; change the join-cost
calculation in path-cost-with-joins-old to sum the values in :join-remaining
(use reduce + with initial 0) so it matches the additive fallback and the
join-cost-old reporting elsewhere.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 52646f36-e205-46c1-88ae-f0669037240d

📥 Commits

Reviewing files that changed from the base of the PR and between 918a95e and b8adba8.

📒 Files selected for processing (1)

server/src/instant/db/datalog.clj

coderabbitai · 2026-03-14T00:24:28Z

server/src/instant/db/datalog.clj

+(defn- debug-cost-breakdown
+  [named-p idx-config costs path-cost-with-joins]
+  (let [unique-cols (or (:unique-cols idx-config) #{})
+        unique-hit? (some #(contains? unique-cols (:col %)) (:path costs))
+        index-lookup-cost (reduce (fn [acc {:keys [cost col]}]
+                                    (let [next-cost (* acc cost)]
+                                      (if (contains? unique-cols col)
+                                        (reduced next-cost)
+                                        next-cost)))
+                                  1
+                                  (:path costs))
+        join-cost-old (reduce + 1 (vals (:join-remaining costs)))
+        join-cost-new (reduce * 1 (vals (:join-remaining costs)))
+        unbounded-filter-cost (reduce + 0 (vals (select-keys (:known-remaining costs)
+                                                             (:filter-components costs))))
+        filter-cost (if unique-hit?
+                      (min index-lookup-cost unbounded-filter-cost)
+                      unbounded-filter-cost)
+        realistic-prefix (debug-realistic-prefix named-p idx-config
+                                                 (:known-components costs))]
+    {:lookup-cost index-lookup-cost
+     :filter-cost filter-cost
+     :unbounded-filter-cost unbounded-filter-cost
+     :unique-hit? unique-hit?
+     :join-cost-old join-cost-old
+     :join-cost-new join-cost-new
+     :score path-cost-with-joins
+     :value-access-mode (value-access-mode named-p)
+     :optimistic-path (mapv :col (:path costs))
+     :realistic-prefix (:path realistic-prefix)
+     :blocked-on (:blocked-on realistic-prefix)
+     :known-remaining (:known-remaining costs)
+     :join-remaining (:join-remaining costs)
+     :filter-remaining (:filter-remaining costs)}))


⚠️ Potential issue | 🟡 Minor

Make the debug filter cost reflect the active model.

When :new-index-cost is false, :score comes from path-cost-with-joins-old, but filter-cost here is still capped with unique-hit?. The breakdown can therefore disagree with the score in the flag-off path you're trying to compare. Either branch this field on use-new-index-cost? or emit separate :filter-cost-old / :filter-cost-new values.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@server/src/instant/db/datalog.clj` around lines 1385 - 1418, In debug-cost-breakdown, make the reported filter cost align with the active indexing-mode: either branch the computed :filter-cost on use-new-index-cost? (so when the flag is false, compute :filter-cost the same way as when :score uses path-cost-with-joins-old) or emit both :filter-cost-old and :filter-cost-new fields; update the calculation that currently uses unique-hit? to produce the alternate value for the old code path (referencing unique-hit?, index-lookup-cost, unbounded-filter-cost, path-cost-with-joins and path-cost-with-joins-old) and ensure :score and the chosen filter-cost reflect the same use-new-index-cost? branch.

improve query plan for unique fields

918a95e

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

server/src/instant/db/datalog.clj Show resolved Hide resolved

review fix

b8adba8

dwwoelfel marked this pull request as draft March 14, 2026 00:21

coderabbitai bot reviewed Mar 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve query plan for unique indexes#2369

Improve query plan for unique indexes#2369
dwwoelfel wants to merge 2 commits intomainfrom
better-query-plan

dwwoelfel commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

dwwoelfel commented Mar 14, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

dwwoelfel commented Mar 13, 2026

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

dwwoelfel commented Mar 14, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 13, 2026 •

edited

Loading