21456: perf : Optimize count distinct using bitmaps instead of hashsets for smaller datatypes by martin-augment · Pull Request #307 · martin-augment/datafusion

martin-augment · 2026-04-13T07:38:04Z

21456: To review by AI

…inct' into optimize_count_distinct

coderabbitai · 2026-04-13T07:38:25Z

Walkthrough

This pull request adds code review guidelines for AI agents and introduces four new specialized accumulator implementations for COUNT DISTINCT operations on small integer types. New structs BoolArray256DistinctCountAccumulator, BoolArray256DistinctCountAccumulatorI8, Bitmap65536DistinctCountAccumulator, and Bitmap65536DistinctCountAccumulatorI16 are implemented in the datafusion/functions-aggregate-common module. The accumulator selection logic in datafusion/functions-aggregate is modified to route UInt8, Int8, UInt16, and Int16 types to these optimized accumulators rather than the generic variant. Public re-exports are added to expose the new accumulator types.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch pr-21456-2026-04-13-07-38-00

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-04-13T07:40:26Z

Code Review

The approach of using bitmaps/bool-arrays for small integer types in COUNT DISTINCT is sound and can yield real performance gains. Here are the issues I found:

Bug: `size()` double-counts inline array memory

For BoolArray256DistinctCountAccumulator and BoolArray256DistinctCountAccumulatorI8, the seen field ([bool; 256]) is inline in the struct — it is not heap-allocated. size_of_val(self) already includes those 256 bytes, so adding + 256 double-counts:

// BoolArray256DistinctCountAccumulator::size() — incorrect
fn size(&self) -> usize {
    size_of_val(self) + 256  // size_of_val already includes the 256-byte seen array
}

Should be:

fn size(&self) -> usize {
    size_of_val(self)
}

The Bitmap65536 variants are correct — bitmap: Box<[u64; 1024]> is heap-allocated, so size_of_val(self) only captures the pointer (8 bytes) and + 8192 covers the allocation.

Profiling artifact leaked into production: `#[inline(never)]`

The PR adds #[inline(never)] to the pre-existing PrimitiveDistinctCountAccumulator::update_batch and FloatDistinctCountAccumulator::update_batch. This looks like a profiling artifact. Forcing no-inlining on hot paths can hurt performance and should not be in a production commit. Please remove those two annotations from the existing accumulators.

Missing tests

No tests are added for any of the four new accumulator types. At minimum, each should have:

A basic update_batch + evaluate roundtrip
A state + merge_batch roundtrip (verify partial aggregation works correctly)
A test with all-null input (should return 0)

Significant code duplication

The four new structs share almost identical structure. The signed/unsigned pairs (u8/i8 and u16/i16) differ only in the type cast for array access. A generic approach with a trait or a macro would reduce ~350 lines of near-identical code to a fraction of that. Consider whether upstream DataFusion would accept a generified version before landing this.

`BoolArray256` should use a bitmap, not `[bool; 256]`

For u8/i8 (256 possible values), a [u64; 4] bitmap (32 bytes) would:

Use 8× less memory than [bool; 256]
Count faster via count_ones() (hardware POPCNT), consistent with the u16/i16 implementation

The current count() iterates 256 elements instead of 4 words. If the goal is performance parity with the bitmap-based u16 path, the u8 path should also use a bitmap.

Naming inconsistency

BoolArray256DistinctCountAccumulator vs Bitmap65536DistinctCountAccumulator — different naming schemes for the same concept
I8/I16 suffixes for signed variants is unusual; consider a consistent pattern or a single generic struct parameterized on the sign

Minor: `.unwrap()` in `get_count_accumulator`

DataType::UInt8 | DataType::Int8 | DataType::UInt16 | DataType::Int16 => {
    get_small_int_accumulator(data_type).unwrap()
}

The unwrap() would panic if get_small_int_accumulator returned an error. Since get_count_accumulator returns Box<dyn Accumulator> (not Result), there is no way to propagate the error. This is only safe because the match arm guarantees the data type is handled — but it is fragile. A #[allow(unreachable_patterns)] + unreachable!() or restructuring to avoid the fallible helper would be cleaner.

Summary: The core idea is solid, but there is a memory-accounting bug in size() for the bool-array variants, a profiling annotation left in on existing code, no tests, and significant code duplication. Please address at least the size() bug and add tests before merging.

augmentcode · 2026-04-13T07:41:42Z

🤖 Augment PR Summary

Summary: Improves COUNT(DISTINCT) performance for small integer types by replacing HashSet-based tracking with compact bitset/array tracking.

Changes:

Added specialized COUNT DISTINCT accumulators for u8/i8 (256-entry bool array) and u16/i16 (65536-bit bitmap).
Exported the new accumulator types from functions-aggregate-common for reuse.
Updated COUNT DISTINCT accumulator selection to route small integer types through a cold helper that constructs these specialized accumulators.
Kept existing HashSet-based accumulators for larger integer types and other existing specialized implementations (bytes/float/etc.).

Technical Notes:
Uses fixed-size tracking structures (256 bytes or ~8KB) to reduce hashing overhead and improve locality for small domains; preserves existing state/merge representation via single-row List states.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-04-13T07:41:43Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+    }
+
+    fn size(&self) -> usize {
+        size_of_val(self) + 256


In size(), size_of_val(self) already includes the inline [bool; 256] storage, so the extra + 256 double-counts memory. This can skew DataFusion’s memory accounting/limits for COUNT DISTINCT on small integer types.

Severity: medium

Other Locations

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs:331

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The seen field is an array, so it is stored in the stack space and thus is included in the calculated value by size_of_val(). It should not be counted manually too. Prevents overcalculating the size of the accumulator.

augmentcode · 2026-04-13T07:41:43Z

datafusion/functions-aggregate/src/count.rs

        ),
+        // Small int types - cold path
+        DataType::UInt8 | DataType::Int8 | DataType::UInt16 | DataType::Int16 => {
+            get_small_int_accumulator(data_type).unwrap()


This change routes UInt8/Int8/UInt16/Int16 through new bitmap/bool-array accumulators, but there don’t appear to be unit tests covering these new distinct paths (especially state()/merge_batch() behavior). Adding coverage here would help catch regressions in distinct correctness across partial/final aggregation.

Severity: medium

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! Test cases for this new code path are not added to the Pull Request and they may not behave as desired and also could regress in the future.

gemini-code-assist

Code Review

This pull request introduces optimized COUNT DISTINCT accumulators for small integer types (u8, i8, u16, i16) using boolean arrays and bitmaps, and updates the count aggregate function to utilize these specialized paths. Additionally, it adds guidelines for AI agents to avoid reviewing specific configuration files or linking to external repositories. The review feedback suggests several performance optimizations, including bypassing Option iteration in update_batch when nulls are absent and using trailing_zeros() for faster bitmap processing, as well as correcting memory size calculations to avoid double-counting fixed-size arrays.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+        for value in arr.iter().flatten() {
+            self.seen[value as usize] = true;
+        }


Iterating over the array using iter().flatten() is inefficient for primitive types because it performs an option check for every element. Since this is a performance-oriented optimization, it's better to check null_count() and use values() directly when no nulls are present.

if arr.null_count() == 0 { for &value in arr.values() { self.seen[value as usize] = true; } } else { for value in arr.iter().flatten() { self.seen[value as usize] = true; } }

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The flatten() operation checks each item in the array whether it is Some or None. This could be avoided for no-nulls arrays by using the constant check arr.null_count() == 0. This single check could avoid N checks for None, where N is the size of the array.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+    }
+
+    fn size(&self) -> usize {
+        size_of_val(self) + 256


The size method is double-counting the memory used by the seen array. Since seen is a fixed-size array within the struct, size_of_val(self) already includes its size (256 bytes).

Suggested change

size_of_val(self) + 256

size_of_val(self)

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The seen field is an array, so it is stored in the stack space and thus is included in the calculated value by size_of_val(). It should not be counted manually too. Prevents overcalculating the size of the accumulator.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+        for value in arr.iter().flatten() {
+            self.seen[value as u8 as usize] = true;
+        }


Similar to the u8 accumulator, using iter().flatten() is less efficient than direct slice access when nulls are absent. Checking null_count() allows for a faster path using values().

if arr.null_count() == 0 { for &value in arr.values() { self.seen[value as u8 as usize] = true; } } else { for value in arr.iter().flatten() { self.seen[value as u8 as usize] = true; } }

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The flatten() operation checks each item in the array whether it is Some or None. This could be avoided for no-nulls arrays by using the constant check arr.null_count() == 0. This single check could avoid N checks for None, where N is the size of the array.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+    }
+
+    fn size(&self) -> usize {
+        size_of_val(self) + 256


The size method is double-counting the memory used by the seen array. size_of_val(self) already includes the 256 bytes of the fixed-size array.

Suggested change

size_of_val(self) + 256

size_of_val(self)

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! The seen field is an array, so it is stored in the stack space and thus is included in the calculated value by size_of_val(). It should not be counted manually too. Prevents overcalculating the size of the accumulator.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+        for value in arr.iter().flatten() {
+            self.set_bit(value);
+        }


Performance can be improved by avoiding the overhead of Option iteration when the array has no nulls.

if arr.null_count() == 0 { for &value in arr.values() { self.set_bit(value); } } else { for value in arr.iter().flatten() { self.set_bit(value); } }

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The flatten() operation checks each item in the array whether it is Some or None. This could be avoided for no-nulls arrays by using the constant check arr.null_count() == 0. This single check could avoid N checks for None, where N is the size of the array.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+        for (word_idx, &word) in self.bitmap.iter().enumerate() {
+            if word != 0 {
+                for bit in 0..64 {
+                    if (word & (1u64 << bit)) != 0 {
+                        values.push((word_idx as u16) * 64 + bit);
+                    }
+                }
+            }
+        }


Iterating through all 64 bits for every non-zero word is inefficient, especially for sparse bitmaps. Using trailing_zeros() to find set bits is significantly faster and more idiomatic for bitmap processing.

for (word_idx, &word) in self.bitmap.iter().enumerate() { let mut w = word; while w != 0 { let bit = w.trailing_zeros(); values.push((word_idx as u16) * 64 + bit as u16); w &= !(1u64 << bit); } }

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! By using u64::trailing_zeros() the algorithm could be optimized to process only the 1s in the bitmap from back to front. This way it will iterate only the 1s which could lead to a big gain for sparse bitmaps.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+        for value in arr.iter().flatten() {
+            self.set_bit(value);
+        }


Performance can be improved by avoiding the overhead of Option iteration when the array has no nulls.

if arr.null_count() == 0 { for &value in arr.values() { self.set_bit(value); } } else { for value in arr.iter().flatten() { self.set_bit(value); } }

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The flatten() operation checks each item in the array whether it is Some or None. This could be avoided for no-nulls arrays by using the constant check arr.null_count() == 0. This single check could avoid N checks for None, where N is the size of the array.

gemini-code-assist · 2026-04-13T07:43:13Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+        for (word_idx, &word) in self.bitmap.iter().enumerate() {
+            if word != 0 {
+                for bit in 0..64 {
+                    if (word & (1u64 << bit)) != 0 {
+                        values.push(((word_idx as u16) * 64 + bit) as i16);
+                    }
+                }
+            }
+        }


Iterating through all 64 bits for every non-zero word is inefficient. Using trailing_zeros() to find set bits is significantly faster.

for (word_idx, &word) in self.bitmap.iter().enumerate() { let mut w = word; while w != 0 { let bit = w.trailing_zeros(); values.push(((word_idx as u16) * 64 + bit as u16) as i16); w &= !(1u64 << bit); } }

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! By using u64::trailing_zeros() the algorithm could be optimized to process only the 1s in the bitmap from back to front. This way it will iterate only the 1s which could lead to a big gain for sparse bitmaps.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit b564773. Configure here.}

cursor · 2026-04-13T07:43:29Z

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs

+
+    fn size(&self) -> usize {
+        size_of_val(self) + 256
+    }


Bool array accumulators double-count memory in size()

Low Severity

The size() method for BoolArray256DistinctCountAccumulator and BoolArray256DistinctCountAccumulatorI8 returns size_of_val(self) + 256, but the seen: [bool; 256] field is stored inline in the struct — not on the heap. size_of_val(self) already includes the 256-byte array, so adding 256 double-counts it, reporting ~512 bytes instead of 256. Contrast this with the Bitmap65536 variants, which correctly add 8192 for the heap-allocated Box<[u64; 1024]> that size_of_val only accounts for as a pointer. This overestimate in memory tracking could cause premature spilling or other unnecessary memory-pressure actions.

Additional Locations (1)

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs#L329-L332

^{Reviewed by Cursor Bugbot for commit b564773. Configure here.}

value:useful; category:bug; feedback: The Bugbot AI reviewer is correct! The seen field is an array, so it is stored in the stack space and thus is included in the calculated value by size_of_val(). It should not be counted manually too. Prevents overcalculating the size of the accumulator.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs (1)
275-324: Add signed round-trip coverage before this ships.

The Int8 and Int16 variants rely on remapping through u8/u16 plus custom state() serialization. A focused update_batch -> state -> merge_batch -> evaluate test around -128, -1, 0, 127 and i16::MIN, -1, 0, i16::MAX would lock down the edge values most likely to regress.

Also applies to: 462-511
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs`
around lines 275 - 324, Add unit tests that exercise signed integer round-trip
for the Int8 and Int16 distinct-count accumulators: specifically create tests
that call update_batch on BoolArray256DistinctCountAccumulatorI8 (and the Int16
counterpart), then call state(), then merge_batch() with that state, and finally
evaluate() to assert correct counts for edge values (-128, -1, 0, 127 for i8 and
i16::MIN, -1, 0, i16::MAX for i16). Ensure tests cover both direct update_batch
inputs and list-encoded states used by merge_batch so the remapping through
u8/u16 and the custom state() serialization is validated end-to-end (use the
functions/methods update_batch, state, merge_batch, evaluate and the accumulator
types named in the diff).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@datafusion/functions-aggregate/src/count.rs`:
- Around line 191-193: The DataType::UInt16 | DataType::Int16 match branch
currently forces creation of the full- bitmap accumulator via
get_small_int_accumulator(data_type).unwrap(), causing eager 8 KiB allocations
per group; change this branch to return the lazy/adaptive accumulator used for
other small-int types (i.e., the accumulator that grows with cardinality and
only materializes a bitmap when needed) instead of the fixed bitmap one so
DISTINCT keeps one growable Accumulator per live group; update the
DataType::UInt16 | DataType::Int16 arm to use the existing lazy/adaptive factory
(or the previous get_small_int_accumulator variant that is lazy) and ensure
types/unwrap usage match the surrounding code.

---

Nitpick comments:
In
`@datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs`:
- Around line 275-324: Add unit tests that exercise signed integer round-trip
for the Int8 and Int16 distinct-count accumulators: specifically create tests
that call update_batch on BoolArray256DistinctCountAccumulatorI8 (and the Int16
counterpart), then call state(), then merge_batch() with that state, and finally
evaluate() to assert correct counts for edge values (-128, -1, 0, 127 for i8 and
i16::MIN, -1, 0, i16::MAX for i16). Ensure tests cover both direct update_batch
inputs and list-encoded states used by merge_batch so the remapping through
u8/u16 and the custom state() serialization is validated end-to-end (use the
functions/methods update_batch, state, merge_batch, evaluate and the accumulator
types named in the diff).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 886dab1a-6bd0-429f-99f2-f9821614f257

📥 Commits

Reviewing files that changed from the base of the PR and between c68fbab and b564773.

📒 Files selected for processing (5)

.cursor/rules.md
AGENTS.md
datafusion/functions-aggregate-common/src/aggregate/count_distinct.rs
datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs
datafusion/functions-aggregate/src/count.rs

coderabbitai · 2026-04-13T07:48:25Z

datafusion/functions-aggregate/src/count.rs

+        DataType::UInt8 | DataType::Int8 | DataType::UInt16 | DataType::Int16 => {
+            get_small_int_accumulator(data_type).unwrap()
+        }


⚠️ Potential issue | 🟠 Major

Avoid routing all UInt16/Int16 groups to a fixed 8 KiB accumulator.

The UInt16/Int16 half of this branch now selects accumulators that eagerly allocate the full bitmap. DISTINCT still uses one Accumulator per live group here, so sparse groups go from “grow with cardinality” to “always 8 KiB”. On wide GROUP BY COUNT(DISTINCT ...) plans that is a sizable memory regression and can force early spill/OOM. Please keep the u16/i16 path lazy/adaptive instead of switching every case to the bitmap.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@datafusion/functions-aggregate/src/count.rs` around lines 191 - 193, The DataType::UInt16 | DataType::Int16 match branch currently forces creation of the full- bitmap accumulator via get_small_int_accumulator(data_type).unwrap(), causing eager 8 KiB allocations per group; change this branch to return the lazy/adaptive accumulator used for other small-int types (i.e., the accumulator that grows with cardinality and only materializes a bitmap when needed) instead of the fixed bitmap one so DISTINCT keeps one growable Accumulator per live group; update the DataType::UInt16 | DataType::Int16 arm to use the existing lazy/adaptive factory (or the previous get_small_int_accumulator variant that is lazy) and ensure types/unwrap usage match the surrounding code.

martin-augment · 2026-04-13T10:33:43Z

Bug: size() double-counts inline array memory

For BoolArray256DistinctCountAccumulator and BoolArray256DistinctCountAccumulatorI8, the seen field ([bool; 256]) is inline in the struct — it is not heap-allocated. size_of_val(self) already includes those 256 bytes, so adding + 256 double-counts:
// BoolArray256DistinctCountAccumulator::size() — incorrect
fn size(&self) -> usize {
    size_of_val(self) + 256  // size_of_val already includes the 256-byte seen array
}
Should be:
fn size(&self) -> usize {
    size_of_val(self)
}
The Bitmap65536 variants are correct — bitmap: Box<[u64; 1024]> is heap-allocated, so size_of_val(self) only captures the pointer (8 bytes) and + 8192 covers the allocation.

value:useful; category:bug; feedback: The Claude AI reviewer is correct! The seen field is an array, so it is stored in the stack space and thus is included in the calculated value by size_of_val(). It should not be counted manually too. Prevents overcalculating the size of the accumulator.

martin-augment · 2026-04-13T10:37:09Z

Profiling artifact leaked into production: #[inline(never)]

The PR adds #[inline(never)] to the pre-existing PrimitiveDistinctCountAccumulator::update_batch and FloatDistinctCountAccumulator::update_batch. This looks like a profiling artifact. Forcing no-inlining on hot paths can hurt performance and should not be in a production commit. Please remove those two annotations from the existing accumulators.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! #[inline(never)] should be used only when the inlining causes some problems or when the function should be compared against an alternative (i.e. profiling). The changes in the PR do not suggest either of these, so the attribute should be removed.

martin-augment · 2026-04-13T10:39:54Z

BoolArray256 should use a bitmap, not [bool; 256]

For u8/i8 (256 possible values), a [u64; 4] bitmap (32 bytes) would:

Use 8× less memory than [bool; 256]

Count faster via count_ones() (hardware POPCNT), consistent with the u16/i16 implementation

The current count() iterates 256 elements instead of 4 words. If the goal is performance parity with the bitmap-based u16 path, the u8 path should also use a bitmap.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! By using [u64; 4]it will occupy 32 bytes which is 8 times less than 256. Also by using POPCNT Assembly instruction it will be faster to count the set bits in the bitmap.

coderfender and others added 19 commits April 9, 2026 14:33

bitmap_smaller_datatypes

8d49dfe

bitmap_smaller_datatypes

0b179ff

bitmap_instead_of_hll_smaller_datatypes

c6095ab

bitmap_instead_of_hll_smaller_datatypes

9d06408

bitmap_instead_of_hll_smaller_datatypes

f185fdc

bitmap_instead_smaller_datatypes

f7c487a

bitmap_instead_smaller_datatypes

3f091d9

bitmap_instead_smaller_datatypes

e09f68a

cleanup_pr

f8c01a1

rebase_main

df90ef5

Merge branch 'main' into optimize_count_distinct

bf0f95c

Merge branch 'main' into optimize_count_distinct

48a6029

Merge branch 'main' into optimize_count_distinct

61dc8e1

remove_boxing_smaller_int_types

5a2918a

Merge remote-tracking branch 'refs/remotes/origin/optimize_count_dist…

3f4952e

…inct' into optimize_count_distinct

remove_boxing_smaller_int_types

289b354

never_inline_bitmap_accumulators

554f60c

move_bitmap_accumulator_cold_path

87a9af8

Add AI config files

b564773

augmentcode bot reviewed Apr 13, 2026

View reviewed changes

gemini-code-assist bot reviewed Apr 13, 2026

View reviewed changes

cursor bot reviewed Apr 13, 2026

View reviewed changes

coderabbitai bot reviewed Apr 13, 2026

View reviewed changes

Conversation

martin-augment commented Apr 13, 2026

Uh oh!

coderabbitai bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Apr 13, 2026

Code Review

Bug: size() double-counts inline array memory

Profiling artifact leaked into production: #[inline(never)]

Missing tests

Significant code duplication

BoolArray256 should use a bitmap, not [bool; 256]

Naming inconsistency

Minor: .unwrap() in get_count_accumulator

Uh oh!

augmentcode bot commented Apr 13, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 13, 2026

coderabbitai bot commented Apr 13, 2026 •

edited

Loading

Bug: `size()` double-counts inline array memory

Profiling artifact leaked into production: `#[inline(never)]`

`BoolArray256` should use a bitmap, not `[bool; 256]`

Minor: `.unwrap()` in `get_count_accumulator`

Bool array accumulators double-count memory in `size()`

Bug: `size()` double-counts inline array memory

Profiling artifact leaked into production: `#[inline(never)]`

`BoolArray256` should use a bitmap, not `[bool; 256]`