CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from remote loading #2133

michaeljmarshall · 2025-11-19T20:00:18Z

What is the issue

Fixes: https://github.com/riptano/cndb/issues/16055

What does this PR fix and why was it fixed

#2030 introduced a bug for CNDB due to the way we inject the file reader. The solution is actually quite simple, I just needed to follow a convention that I wasn't familiar with. By calling tmpFileFor, I get the proper file extension (which cleans the file in case of restart) and I get the local only file.

…mote loading

github-actions · 2025-11-19T20:02:37Z

jkni

LGTM

sonarqubecloud · 2025-11-19T23:58:54Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
100.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

cassci-bot · 2025-11-20T00:22:04Z

❌ Build ds-cassandra-pr-gate/PR-2133 rejected by Butler

2 regressions found
See build details here

Found 2 new test failures

Test	Explanation	Runs	Upstream
o.a.c.index.sai.cql.VectorCompaction100dTest.testOneToManyCompactionTooManyHoles[eb false]	NEW	🔴	0 / 17
o.a.c.index.sai.cql.VectorSiftSmallTest.testSiftSmall[db false]	NEW	🔴	0 / 17

No known test failures found

…action (#2030) + CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from remote loading (#2133) Fixes: riptano/cndb#15469 CNDB PR: riptano/cndb#15813 This PR integrates the jvector NVQ feature into SAI vector indexes built via compaction. This feature is disabled by default (by `cassandra.sai.vector.enable_nvq`) to continue providing the best recall when storage savings is not an explicit concern. The jvector library describes NVQ with: > Support for Non-uniform Vector Quantization (NVQ, pronounced as "new vec"). This new technique quantizes the values in each vector with high accuracy by first applying a nonlinear transformation that is individually fit to each vector. These nonlinearities are designed to be lightweight and have a negligible impact on distance computation performance. This feature is only available in SAI on disk version `EC` and later. It can be enabled by setting `cassandra.sai.vector.enable_nvq` to `true` and selecting `cassandra.sai.latest.version=ec` or greater. When enabled, we can expect NVQ to reduce the storage footprint of the graph (stored in the `TERMS` file) because quantized vectors are stored inline instead of the full precision vectors. A possible result of storing these smaller vectors is fewer iops due to improved efficiency of a graph node fitting within a single 4 kb page. We do not have any new metrics exposed to track this feature beyond disk utilization. When troubleshooting, this log line will help determine what features an on disk graph is using: ```java logger.debug("Opened graph for {} for sstable row id offset {} with {} features", source, segmentMetadata.segmentRowIdOffset, features); ``` NVQ will be in the list if it is in use. tl;dr: NVQ works for earlier versions of CC because the on disk format hasn't changed and jvector knows how to read it. If you enable NVQ on CC without this PR and with `ann_use_synthetic_score = true`, you might see out of order results. One side effect of NVQ is that the NVQ vector similarity score is slightly different than the full precision score. This is primarily a problem when the synthetic score is in use (`cassandra.sai.ann_use_synthetic_score`) because the synthetic score was based on the score from the index. Now that this score does not necessarily equal the FP sim score, we must compute the FP sim score before sending the synthetic score to the coordinator. Otherwise, we will end up with out of order vectors. Because older versions of CC do not correct for this, it is possible to send the wrong score to the coordinator. However, because this feature is disabled by default, there is not really a risk of sending the wrong score. CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from remote loading (#2133) Fixes: riptano/cndb#16055 #2030 introduced a bug for CNDB due to the way we inject the file reader. The solution is actually quite simple, I just needed to follow a convention that I wasn't familiar with. By calling `tmpFileFor`, I get the proper file extension (which cleans the file in case of restart) and I get the local only file.

CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from re…

846e543

…mote loading

michaeljmarshall self-assigned this Nov 19, 2025

michaeljmarshall requested a review from JeremiahDJordan November 19, 2025 22:48

JeremiahDJordan approved these changes Nov 19, 2025

View reviewed changes

jkni self-requested a review November 19, 2025 23:52

jkni approved these changes Nov 19, 2025

View reviewed changes

michaeljmarshall merged commit b1010a2 into main Nov 20, 2025
492 of 499 checks passed

michaeljmarshall deleted the cndb-16055 branch November 20, 2025 04:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from remote loading #2133

CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from remote loading #2133

Uh oh!

michaeljmarshall commented Nov 19, 2025

Uh oh!

github-actions bot commented Nov 19, 2025 •

edited by michaeljmarshall

Loading

Uh oh!

jkni left a comment

Uh oh!

sonarqubecloud bot commented Nov 19, 2025

Uh oh!

cassci-bot commented Nov 20, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from remote loading #2133

CNDB-16055: Use tmpFileFor in CompactionGraph to prevent CNDB from remote loading #2133

Uh oh!

Conversation

michaeljmarshall commented Nov 19, 2025

What is the issue

What does this PR fix and why was it fixed

Uh oh!

github-actions bot commented Nov 19, 2025 • edited by michaeljmarshall Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist before you submit for review

Uh oh!

jkni left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Nov 19, 2025

Quality Gate passed

Uh oh!

cassci-bot commented Nov 20, 2025

❌ Build ds-cassandra-pr-gate/PR-2133 rejected by Butler

Found 2 new test failures

No known test failures found

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

github-actions bot commented Nov 19, 2025 •

edited by michaeljmarshall

Loading