Skip to content

Improve query plan for unique indexes#2369

Draft
dwwoelfel wants to merge 2 commits intomainfrom
better-query-plan
Draft

Improve query plan for unique indexes#2369
dwwoelfel wants to merge 2 commits intomainfrom
better-query-plan

Conversation

@dwwoelfel
Copy link
Contributor

Two things here:

First, we were incorrectly counting some columns that we couldn't join on in the join cost. For example, if we had a query like {age: 10}, where age is typed. Our query will query on triples_extract_number_value(value) = 10. If we were using the ave_with_e_index, then we couldn't count on filtering the v in the join condition. The v in the ave_with_e_index is a json value instead of a number.

Second, we should be more aggressive on using unique indexes. When we model the cost, we calculate the index lookup cost and the filtering cost. If we use the unique columns in the index lookup, then we know that we'll be filtering at most index-cost rows in the filter.

Just in case this causes a regression, the new code is all behind a new-index-cost toggle.

Also adds a bunch more debug info when *debug* is set to true.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 13, 2026

📝 Walkthrough

Walkthrough

Adds a feature-flagged index cost model and related helpers to query planning: SQL operator mapping, value-access-mode and index applicability checks, a new multiplicative join-cost path, detailed debug cost breakdowns, plus a test verifying index selection with date-based sketches.

Changes

Cohort / File(s) Summary
Index Cost Model Enhancements
server/src/instant/db/datalog.clj
Introduced feature-flag-gated index cost model and new public APIs: op->sql-op, use-new-index-cost?, value-access-mode, can-match-on-index-column?, debug-realistic-prefix, debug-cost-breakdown. Refactored path-cost logic to route between old/new implementations and replaced additive join-cost with a multiplicative model; added debug annotations and cost metrics.
Test Coverage
server/test/instant/db/datalog_test.clj
Added best-index-ignores-index-columns-it-cant-use test using instant.db.attr-sketch and java.time.Instant to validate best-index selection when index columns are inapplicable.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Improve query plan for unique indexes' directly relates to the main objective of the changeset: optimizing query planning for unique indexes and fixing join-cost counting issues.
Description check ✅ Passed The description clearly explains both key changes: fixing join-cost counting for columns that cannot be joined on, and improving cost modeling for unique indexes. It also mentions the feature-flag safety mechanism.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch better-query-plan
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
server/test/instant/db/datalog_test.clj (1)

49-69: LGTM - Good test coverage for new index selection logic.

The test correctly validates that when an index column can't be used (typed-extract mode with mismatched data types), the planner falls back to triples_pkey. The setup with a date sketch and $gt comparator properly exercises the value-access-mode and can-match-on-index-column? code paths.

Consider adding a brief comment explaining why :triples_pkey is the expected result (e.g., "ave_with_e_index stores value as JSON, not as a typed date, so it can't be used for typed comparisons").

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/test/instant/db/datalog_test.clj` around lines 49 - 69, Add a brief
comment in the test best-index-ignores-index-columns-it-cant-use explaining why
:triples_pkey is expected (e.g., that the :ave index stores values as JSON/not
as typed dates so typed comparisons via
value-access-mode/can-match-on-index-column? fail), referencing the planner path
(d/best-index -> can-match-on-index-column? / value-access-mode) and the named-p
index (:idx [:map {:idx-key :ave}]) so future readers understand the fallback to
:triples_pkey.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1329-1339: The code currently calls op->sql-op on a possibly nil
op before dispatching on func, which will throw if op is nil; move the
op->sql-op call inside the :$comparator branch (where sql-op is actually used)
so that you only call (op->sql-op op) when func is :$comparator; keep the
existing logic that uses (extract-value-fn data-type sql-op) in that branch and
leave other branches (:$not, :$isNull, :$entityIdStartsWith, :raw) unchanged so
they do not evaluate op->sql-op when op is nil.

---

Nitpick comments:
In `@server/test/instant/db/datalog_test.clj`:
- Around line 49-69: Add a brief comment in the test
best-index-ignores-index-columns-it-cant-use explaining why :triples_pkey is
expected (e.g., that the :ave index stores values as JSON/not as typed dates so
typed comparisons via value-access-mode/can-match-on-index-column? fail),
referencing the planner path (d/best-index -> can-match-on-index-column? /
value-access-mode) and the named-p index (:idx [:map {:idx-key :ave}]) so future
readers understand the fallback to :triples_pkey.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cd1d3a6b-a98c-4fec-a4cf-fa1035b5b4e6

📥 Commits

Reviewing files that changed from the base of the PR and between e9cadbe and 918a95e.

📒 Files selected for processing (2)
  • server/src/instant/db/datalog.clj
  • server/test/instant/db/datalog_test.clj

@dwwoelfel dwwoelfel marked this pull request as draft March 14, 2026 00:21
@dwwoelfel
Copy link
Contributor Author

I'm going to mark this as a draft--I might have a better way.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/src/instant/db/datalog.clj (1)

1257-1276: ⚠️ Potential issue | 🟠 Major

Keep the fallback join-cost formula additive.

Line 1270 now multiplies :join-remaining, so turning :new-index-cost off no longer restores the previous join-cost behavior. That also disagrees with Lines 1396-1397, which still report the additive variant as join-cost-old.

🐛 Proposed fix
-        join-cost (reduce * 1 (vals (:join-remaining costs)))
+        join-cost (reduce + 1 (vals (:join-remaining costs)))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/src/instant/db/datalog.clj` around lines 1257 - 1276, The fallback
join-cost computation in path-cost-with-joins-old currently multiplies values
from :join-remaining (reduce *), which makes toggling :new-index-cost not revert
to the old additive behavior and conflicts with the reported join-cost-old;
change the join-cost calculation in path-cost-with-joins-old to sum the values
in :join-remaining (use reduce + with initial 0) so it matches the additive
fallback and the join-cost-old reporting elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1385-1418: In debug-cost-breakdown, make the reported filter cost
align with the active indexing-mode: either branch the computed :filter-cost on
use-new-index-cost? (so when the flag is false, compute :filter-cost the same
way as when :score uses path-cost-with-joins-old) or emit both :filter-cost-old
and :filter-cost-new fields; update the calculation that currently uses
unique-hit? to produce the alternate value for the old code path (referencing
unique-hit?, index-lookup-cost, unbounded-filter-cost, path-cost-with-joins and
path-cost-with-joins-old) and ensure :score and the chosen filter-cost reflect
the same use-new-index-cost? branch.

---

Outside diff comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1257-1276: The fallback join-cost computation in
path-cost-with-joins-old currently multiplies values from :join-remaining
(reduce *), which makes toggling :new-index-cost not revert to the old additive
behavior and conflicts with the reported join-cost-old; change the join-cost
calculation in path-cost-with-joins-old to sum the values in :join-remaining
(use reduce + with initial 0) so it matches the additive fallback and the
join-cost-old reporting elsewhere.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 52646f36-e205-46c1-88ae-f0669037240d

📥 Commits

Reviewing files that changed from the base of the PR and between 918a95e and b8adba8.

📒 Files selected for processing (1)
  • server/src/instant/db/datalog.clj

Comment on lines +1385 to +1418
(defn- debug-cost-breakdown
[named-p idx-config costs path-cost-with-joins]
(let [unique-cols (or (:unique-cols idx-config) #{})
unique-hit? (some #(contains? unique-cols (:col %)) (:path costs))
index-lookup-cost (reduce (fn [acc {:keys [cost col]}]
(let [next-cost (* acc cost)]
(if (contains? unique-cols col)
(reduced next-cost)
next-cost)))
1
(:path costs))
join-cost-old (reduce + 1 (vals (:join-remaining costs)))
join-cost-new (reduce * 1 (vals (:join-remaining costs)))
unbounded-filter-cost (reduce + 0 (vals (select-keys (:known-remaining costs)
(:filter-components costs))))
filter-cost (if unique-hit?
(min index-lookup-cost unbounded-filter-cost)
unbounded-filter-cost)
realistic-prefix (debug-realistic-prefix named-p idx-config
(:known-components costs))]
{:lookup-cost index-lookup-cost
:filter-cost filter-cost
:unbounded-filter-cost unbounded-filter-cost
:unique-hit? unique-hit?
:join-cost-old join-cost-old
:join-cost-new join-cost-new
:score path-cost-with-joins
:value-access-mode (value-access-mode named-p)
:optimistic-path (mapv :col (:path costs))
:realistic-prefix (:path realistic-prefix)
:blocked-on (:blocked-on realistic-prefix)
:known-remaining (:known-remaining costs)
:join-remaining (:join-remaining costs)
:filter-remaining (:filter-remaining costs)}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the debug filter cost reflect the active model.

When :new-index-cost is false, :score comes from path-cost-with-joins-old, but filter-cost here is still capped with unique-hit?. The breakdown can therefore disagree with the score in the flag-off path you're trying to compare. Either branch this field on use-new-index-cost? or emit separate :filter-cost-old / :filter-cost-new values.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@server/src/instant/db/datalog.clj` around lines 1385 - 1418, In
debug-cost-breakdown, make the reported filter cost align with the active
indexing-mode: either branch the computed :filter-cost on use-new-index-cost?
(so when the flag is false, compute :filter-cost the same way as when :score
uses path-cost-with-joins-old) or emit both :filter-cost-old and
:filter-cost-new fields; update the calculation that currently uses unique-hit?
to produce the alternate value for the old code path (referencing unique-hit?,
index-lookup-cost, unbounded-filter-cost, path-cost-with-joins and
path-cost-with-joins-old) and ensure :score and the chosen filter-cost reflect
the same use-new-index-cost? branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant