Conversation
📝 WalkthroughWalkthroughAdds a feature-flagged index cost model and related helpers to query planning: SQL operator mapping, value-access-mode and index applicability checks, a new multiplicative join-cost path, detailed debug cost breakdowns, plus a test verifying index selection with date-based sketches. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
server/test/instant/db/datalog_test.clj (1)
49-69: LGTM - Good test coverage for new index selection logic.The test correctly validates that when an index column can't be used (typed-extract mode with mismatched data types), the planner falls back to
triples_pkey. The setup with a date sketch and$gtcomparator properly exercises thevalue-access-modeandcan-match-on-index-column?code paths.Consider adding a brief comment explaining why
:triples_pkeyis the expected result (e.g., "ave_with_e_index stores value as JSON, not as a typed date, so it can't be used for typed comparisons").🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/test/instant/db/datalog_test.clj` around lines 49 - 69, Add a brief comment in the test best-index-ignores-index-columns-it-cant-use explaining why :triples_pkey is expected (e.g., that the :ave index stores values as JSON/not as typed dates so typed comparisons via value-access-mode/can-match-on-index-column? fail), referencing the planner path (d/best-index -> can-match-on-index-column? / value-access-mode) and the named-p index (:idx [:map {:idx-key :ave}]) so future readers understand the fallback to :triples_pkey.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1329-1339: The code currently calls op->sql-op on a possibly nil
op before dispatching on func, which will throw if op is nil; move the
op->sql-op call inside the :$comparator branch (where sql-op is actually used)
so that you only call (op->sql-op op) when func is :$comparator; keep the
existing logic that uses (extract-value-fn data-type sql-op) in that branch and
leave other branches (:$not, :$isNull, :$entityIdStartsWith, :raw) unchanged so
they do not evaluate op->sql-op when op is nil.
---
Nitpick comments:
In `@server/test/instant/db/datalog_test.clj`:
- Around line 49-69: Add a brief comment in the test
best-index-ignores-index-columns-it-cant-use explaining why :triples_pkey is
expected (e.g., that the :ave index stores values as JSON/not as typed dates so
typed comparisons via value-access-mode/can-match-on-index-column? fail),
referencing the planner path (d/best-index -> can-match-on-index-column? /
value-access-mode) and the named-p index (:idx [:map {:idx-key :ave}]) so future
readers understand the fallback to :triples_pkey.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: cd1d3a6b-a98c-4fec-a4cf-fa1035b5b4e6
📒 Files selected for processing (2)
server/src/instant/db/datalog.cljserver/test/instant/db/datalog_test.clj
|
I'm going to mark this as a draft--I might have a better way. |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server/src/instant/db/datalog.clj (1)
1257-1276:⚠️ Potential issue | 🟠 MajorKeep the fallback join-cost formula additive.
Line 1270 now multiplies
:join-remaining, so turning:new-index-costoff no longer restores the previous join-cost behavior. That also disagrees with Lines 1396-1397, which still report the additive variant asjoin-cost-old.🐛 Proposed fix
- join-cost (reduce * 1 (vals (:join-remaining costs))) + join-cost (reduce + 1 (vals (:join-remaining costs)))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@server/src/instant/db/datalog.clj` around lines 1257 - 1276, The fallback join-cost computation in path-cost-with-joins-old currently multiplies values from :join-remaining (reduce *), which makes toggling :new-index-cost not revert to the old additive behavior and conflicts with the reported join-cost-old; change the join-cost calculation in path-cost-with-joins-old to sum the values in :join-remaining (use reduce + with initial 0) so it matches the additive fallback and the join-cost-old reporting elsewhere.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1385-1418: In debug-cost-breakdown, make the reported filter cost
align with the active indexing-mode: either branch the computed :filter-cost on
use-new-index-cost? (so when the flag is false, compute :filter-cost the same
way as when :score uses path-cost-with-joins-old) or emit both :filter-cost-old
and :filter-cost-new fields; update the calculation that currently uses
unique-hit? to produce the alternate value for the old code path (referencing
unique-hit?, index-lookup-cost, unbounded-filter-cost, path-cost-with-joins and
path-cost-with-joins-old) and ensure :score and the chosen filter-cost reflect
the same use-new-index-cost? branch.
---
Outside diff comments:
In `@server/src/instant/db/datalog.clj`:
- Around line 1257-1276: The fallback join-cost computation in
path-cost-with-joins-old currently multiplies values from :join-remaining
(reduce *), which makes toggling :new-index-cost not revert to the old additive
behavior and conflicts with the reported join-cost-old; change the join-cost
calculation in path-cost-with-joins-old to sum the values in :join-remaining
(use reduce + with initial 0) so it matches the additive fallback and the
join-cost-old reporting elsewhere.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 52646f36-e205-46c1-88ae-f0669037240d
📒 Files selected for processing (1)
server/src/instant/db/datalog.clj
| (defn- debug-cost-breakdown | ||
| [named-p idx-config costs path-cost-with-joins] | ||
| (let [unique-cols (or (:unique-cols idx-config) #{}) | ||
| unique-hit? (some #(contains? unique-cols (:col %)) (:path costs)) | ||
| index-lookup-cost (reduce (fn [acc {:keys [cost col]}] | ||
| (let [next-cost (* acc cost)] | ||
| (if (contains? unique-cols col) | ||
| (reduced next-cost) | ||
| next-cost))) | ||
| 1 | ||
| (:path costs)) | ||
| join-cost-old (reduce + 1 (vals (:join-remaining costs))) | ||
| join-cost-new (reduce * 1 (vals (:join-remaining costs))) | ||
| unbounded-filter-cost (reduce + 0 (vals (select-keys (:known-remaining costs) | ||
| (:filter-components costs)))) | ||
| filter-cost (if unique-hit? | ||
| (min index-lookup-cost unbounded-filter-cost) | ||
| unbounded-filter-cost) | ||
| realistic-prefix (debug-realistic-prefix named-p idx-config | ||
| (:known-components costs))] | ||
| {:lookup-cost index-lookup-cost | ||
| :filter-cost filter-cost | ||
| :unbounded-filter-cost unbounded-filter-cost | ||
| :unique-hit? unique-hit? | ||
| :join-cost-old join-cost-old | ||
| :join-cost-new join-cost-new | ||
| :score path-cost-with-joins | ||
| :value-access-mode (value-access-mode named-p) | ||
| :optimistic-path (mapv :col (:path costs)) | ||
| :realistic-prefix (:path realistic-prefix) | ||
| :blocked-on (:blocked-on realistic-prefix) | ||
| :known-remaining (:known-remaining costs) | ||
| :join-remaining (:join-remaining costs) | ||
| :filter-remaining (:filter-remaining costs)})) |
There was a problem hiding this comment.
Make the debug filter cost reflect the active model.
When :new-index-cost is false, :score comes from path-cost-with-joins-old, but filter-cost here is still capped with unique-hit?. The breakdown can therefore disagree with the score in the flag-off path you're trying to compare. Either branch this field on use-new-index-cost? or emit separate :filter-cost-old / :filter-cost-new values.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@server/src/instant/db/datalog.clj` around lines 1385 - 1418, In
debug-cost-breakdown, make the reported filter cost align with the active
indexing-mode: either branch the computed :filter-cost on use-new-index-cost?
(so when the flag is false, compute :filter-cost the same way as when :score
uses path-cost-with-joins-old) or emit both :filter-cost-old and
:filter-cost-new fields; update the calculation that currently uses unique-hit?
to produce the alternate value for the old code path (referencing unique-hit?,
index-lookup-cost, unbounded-filter-cost, path-cost-with-joins and
path-cost-with-joins-old) and ensure :score and the chosen filter-cost reflect
the same use-new-index-cost? branch.
Two things here:
First, we were incorrectly counting some columns that we couldn't join on in the join cost. For example, if we had a query like
{age: 10}, where age is typed. Our query will query ontriples_extract_number_value(value) = 10. If we were using theave_with_e_index, then we couldn't count on filtering thevin the join condition. Thevin theave_with_e_indexis a json value instead of a number.Second, we should be more aggressive on using unique indexes. When we model the cost, we calculate the index lookup cost and the filtering cost. If we use the unique columns in the index lookup, then we know that we'll be filtering at most
index-costrows in the filter.Just in case this causes a regression, the new code is all behind a
new-index-costtoggle.Also adds a bunch more debug info when
*debug*is set to true.