Skip to content

[Arrow]Configure max deduplication length for StringView#8990

Merged
alamb merged 1 commit intoapache:mainfrom
lichuang:fix-issue-7187
Dec 17, 2025
Merged

[Arrow]Configure max deduplication length for StringView#8990
alamb merged 1 commit intoapache:mainfrom
lichuang:fix-issue-7187

Conversation

@lichuang
Copy link
Copy Markdown
Contributor

@lichuang lichuang commented Dec 14, 2025

Configure max deduplication length when deduplicating strings while building the array

Which issue does this PR close?

Configure max deduplication length when deduplicating strings while building the array

Rationale for this change

Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.

What changes are included in this PR?

There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.

Are these changes tested?

We typically require tests for all PRs in order to:

  1. Prevent the code from being accidentally broken by subsequent changes
  2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?

Are there any user-facing changes?

If there are user-facing changes then we may require documentation to be updated before approving the PR.

If there are any breaking changes to public APIs, please call them out.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 14, 2025
@lichuang lichuang force-pushed the fix-issue-7187 branch 3 times, most recently from 59c1e83 to d029734 Compare December 14, 2025 03:49
// the idx is current length of views_builder, as we are inserting a new view
vacant.insert(self.views_buffer.len());
// (2) len > `max_deduplication_len`
if length > self.max_deduplication_len {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In practice I suspect most strings that make sense to deduplicate are relatively short, e.g. <64 bytes.

Pardon, I guess here comes some confusion? The code behaves like min_deduplication_len.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tks, fixed

@lichuang lichuang force-pushed the fix-issue-7187 branch 4 times, most recently from 62ce488 to d0bb6b2 Compare December 14, 2025 08:31
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lichuang -- looks nice. I left some suggestions

/// Default is [`MAX_INLINE_VIEW_LEN`] bytes.
/// See <https://github.com/apache/arrow-rs/issues/7187> for more details on the implications.
pub fn with_max_deduplication_len(self, max_deduplication_len: u32) -> Self {
debug_assert!(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not leave use assert rather than debug_assert?

We should also document in the comments that the parameter must be greater than zero if we are going to assert

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i look at the with_fixed_block_size function, it use debug_assert:

    pub fn with_fixed_block_size(self, block_size: u32) -> Self {
        debug_assert!(block_size > 0, "Block size must be greater than 0");
        Self {
            block_size: BlockSizeGrowthStrategy::Fixed { size: block_size },
            ..self
        }
    }

Comment thread arrow-array/src/builder/generic_bytes_view_builder.rs Outdated
&& value_3.len() < max_deduplication_len.as_usize()
);

let value_checker = |v: &[u8], builder: &StringViewBuilder| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than asserting internal stae of the builder, how about setitng the max deduplicate len to something small (like 1) and then pushing deuplicated strings in

You can then assert that all the values point at distinct offsets in the resulting views

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 1 < MAX_INLINE_VIEW_LEN, it will be save and return directly:

    pub fn try_append_value(&mut self, value: impl AsRef<T::Native>) -> Result<(), ArrowError> {
        let v: &[u8] = value.as_ref().as_ref();
        let length: u32 = v.len().try_into().map_err(|_| {
            ArrowError::InvalidArgumentError(format!("String length {} exceeds u32::MAX", v.len()))
        })?;

        if length <= MAX_INLINE_VIEW_LEN {
            let mut view_buffer = [0; 16];
            view_buffer[0..4].copy_from_slice(&length.to_le_bytes());
            view_buffer[4..4 + v.len()].copy_from_slice(v);
            self.views_buffer.push(u128::from_le_bytes(view_buffer));
            self.null_buffer_builder.append_non_null();
            return Ok(());
        }
  // ...
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was thinking something like this which verifies the output (that the strings are not deduplicated) rather than the internal state of the builder:

    #[test]
    fn test_string_max_deduplication_len() {
        let value_1 = "short";
        let value_2 = "not so similar string but long";
        let value_3 = "1234567890123";

        let max_deduplication_len = MAX_INLINE_VIEW_LEN * 2;

        let mut builder = StringViewBuilder::new()
            .with_deduplicate_strings()
            .with_max_deduplication_len(max_deduplication_len);

        assert!(value_1.len() < MAX_INLINE_VIEW_LEN.as_usize());
        assert!(value_2.len() > max_deduplication_len.as_usize());
        assert!(
            value_3.len() > MAX_INLINE_VIEW_LEN.as_usize()
                && value_3.len() < max_deduplication_len.as_usize()
        );

        // append value1 (short), expect it is inlined and deduplicated
        builder.append_value(value_1); // view 0
        builder.append_value(value_1); // view 1
        // append value2, expect second copy is not deduplicated as it exceeds max_deduplication_len
        builder.append_value(value_2); // view 2
        builder.append_value(value_2); // view 3
        // append value3, expect second copy is deduplicated
        builder.append_value(value_3); // view 4
        builder.append_value(value_3); // view 5

        let array = builder.finish();

        // verify
        let v2 = ByteView::from(array.views()[2]);
        let v3 = ByteView::from(array.views()[3]);
        assert_eq!(v2.buffer_index, v3.buffer_index); // stored in same buffer
        assert_ne!(v2.offset, v3.offset); // different offsets --> not deduplicated

        let v4 = ByteView::from(array.views()[4]);
        let v5 = ByteView::from(array.views()[5]);
        assert_eq!(v4.buffer_index, v5.buffer_index); // stored in same buffer
        assert_eq!(v4.offset, v5.offset); // same offsets --> deduplicated
    }

/// Some if deduplicating strings
/// map `<string hash> -> <index to the views>`
string_tracker: Option<(HashTable<usize>, ahash::RandomState)>,
max_deduplication_len: Option<u32>,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will changing self.max_deduplication_len to u32 and setting the default value to MAX_INLINE_VIEW_LEN be better? After this, we can unify the logic in L354 to length < self.max_deduplication_len

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lichuang -- this is looking close

Comment thread arrow-array/src/builder/generic_bytes_view_builder.rs Outdated
&& value_3.len() < max_deduplication_len.as_usize()
);

let value_checker = |v: &[u8], builder: &StringViewBuilder| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I was thinking something like this which verifies the output (that the strings are not deduplicated) rather than the internal state of the builder:

    #[test]
    fn test_string_max_deduplication_len() {
        let value_1 = "short";
        let value_2 = "not so similar string but long";
        let value_3 = "1234567890123";

        let max_deduplication_len = MAX_INLINE_VIEW_LEN * 2;

        let mut builder = StringViewBuilder::new()
            .with_deduplicate_strings()
            .with_max_deduplication_len(max_deduplication_len);

        assert!(value_1.len() < MAX_INLINE_VIEW_LEN.as_usize());
        assert!(value_2.len() > max_deduplication_len.as_usize());
        assert!(
            value_3.len() > MAX_INLINE_VIEW_LEN.as_usize()
                && value_3.len() < max_deduplication_len.as_usize()
        );

        // append value1 (short), expect it is inlined and deduplicated
        builder.append_value(value_1); // view 0
        builder.append_value(value_1); // view 1
        // append value2, expect second copy is not deduplicated as it exceeds max_deduplication_len
        builder.append_value(value_2); // view 2
        builder.append_value(value_2); // view 3
        // append value3, expect second copy is deduplicated
        builder.append_value(value_3); // view 4
        builder.append_value(value_3); // view 5

        let array = builder.finish();

        // verify
        let v2 = ByteView::from(array.views()[2]);
        let v3 = ByteView::from(array.views()[3]);
        assert_eq!(v2.buffer_index, v3.buffer_index); // stored in same buffer
        assert_ne!(v2.offset, v3.offset); // different offsets --> not deduplicated

        let v4 = ByteView::from(array.views()[4]);
        let v5 = ByteView::from(array.views()[5]);
        assert_eq!(v4.buffer_index, v5.buffer_index); // stored in same buffer
        assert_eq!(v4.offset, v5.offset); // same offsets --> deduplicated
    }

Comment on lines +353 to +358
// (2) len > `MAX_INLINE_VIEW_LEN` and len < `max_deduplication_len`
let can_deduplicate = match self.max_deduplication_len {
Some(max_deduplication_len) => length <= max_deduplication_len,
None => true,
};
if can_deduplicate {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is minor, but as written this code will check the deduplicate length on each row, even if we are not deduplicating

How about only checking after the call to string_tracker.take()?

something like

        // Deduplication if:
        // (1) deduplication is enabled.
        // (2) len > `MAX_INLINE_VIEW_LEN` and len < `max_deduplication_len`
        if let Some((mut ht, hasher)) = self.string_tracker.take() {
            if self
                .max_deduplication_len
                .map(|max_len| length > max_len)
                .unwrap_or(false)
            {
                let hash_val = hasher.hash_one(v);
                let hasher_fn = |v: &_| hasher.hash_one(v);

                let entry = ht.entry(
                    hash_val,
                    |idx| {
                        let stored_value = self.get_value(*idx);
                        v == stored_value
                    },
                    hasher_fn,
                );
                match entry {
                    Entry::Occupied(occupied) => {
                        // If the string already exists, we will directly use the view
                        let idx = occupied.get();
                        self.views_buffer.push(self.views_buffer[*idx]);
                        self.null_buffer_builder.append_non_null();
                        self.string_tracker = Some((ht, hasher));
                        return Ok(());
                    }
                    Entry::Vacant(vacant) => {
                        // o.w. we insert the (string hash -> view index)
                        // the idx is current length of views_builder, as we are inserting a new view
                        vacant.insert(self.views_buffer.len());
                    }
                }
            }
            self.string_tracker = Some((ht, hasher));
        }

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the code you provided, every time it will take string_tracker out and store it, even though v.len > max_deduplication_len, i think saving string_tracker price is more than v.len > max_deduplication_len

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I am worried about is slowing down the case when string deduplication is not enabled. I will run some benchmarks to make sure this change doesn't affect performance

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the new pr, i change the code as:

let can_deduplicate = if self.string_tracker.is_some() {
            match self.max_deduplication_len {
                Some(max_deduplication_len) => length <= max_deduplication_len,
                None => true,
            }
        } else {
            false
        };

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Dec 16, 2025

run benchmark view_types concatenate_kernels

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @lichuang -- this looks good to me. I left some suggestions to maybe make the code clearer but I don't think it is needed.

Assuming the benchmarks don't show any regressions, I think this one is good to go

Comment on lines +357 to +364
let can_deduplicate = if self.string_tracker.is_some() {
match self.max_deduplication_len {
Some(max_deduplication_len) => length <= max_deduplication_len,
None => true,
}
} else {
false
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is another possibly clearer way to express this same logic:

        let can_deduplicate = self.string_tracker.is_some()
            && self
                .max_deduplication_len
                .map(|max_length| length <= max_length)
                .unwrap_or_default();

} else {
false
};
if can_deduplicate {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can avoid the extra indent layer by combining the statements together I think:

Instead of

        if can_deduplicate {
            if let Some((mut ht, hasher)) = self.string_tracker.take() {

This:

        if can_deduplicate && let Some((mut ht, hasher)) = self.string_tracker.take() {

@alamb alamb changed the title [Arrow]Configure max deduplication length [Arrow]Configure max deduplication length for StringView Dec 16, 2025
@lichuang lichuang force-pushed the fix-issue-7187 branch 3 times, most recently from 251ed58 to 9610686 Compare December 16, 2025 19:50
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Dec 16, 2025

show benchmark queue

@alamb-ghbot
Copy link
Copy Markdown

🤖 Hi @alamb, you asked to view the benchmark queue (#8990 (comment)).

Job User Benchmarks Comment
19344_3660885262.sh alamb default https://github.com/apache/datafusion/pull/19344#issuecomment-3660885262
19346_3659197246.sh Dandandan aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3659197246
19346_3657820674.sh rluvaton aggregate_query_sql https://github.com/apache/datafusion/pull/19346#issuecomment-3657820674
19304_3661958730.sh alamb clickbench_partitioned https://github.com/apache/datafusion/pull/19304#issuecomment-3661958730
arrow-8990-3662017225.sh alamb view_types concatenate_kernels https://github.com/apache/arrow-rs/pull/8990#issuecomment-3662017225

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Dec 16, 2025

Oh no, I think #8990 (comment) broke the MSRV check -- we'll have to back that out. Sorry @lichuang

@alamb-ghbot
Copy link
Copy Markdown

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-issue-7187 (d058cb4) to f8796fd diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=fix-issue-7187
Results will be posted here when complete

@alamb-ghbot
Copy link
Copy Markdown

🤖: Benchmark completed

Details

group                                             fix-issue-7187                         main
-----                                             --------------                         ----
gc view types all without nulls[100000]           1.03  1577.8±51.81µs        ? ?/sec    1.00  1525.3±46.89µs        ? ?/sec
gc view types all without nulls[8000]             1.00     63.8±1.79µs        ? ?/sec    1.04     66.6±3.20µs        ? ?/sec
gc view types all[100000]                         1.00    291.8±5.35µs        ? ?/sec    1.02   296.7±12.94µs        ? ?/sec
gc view types all[8000]                           1.01     22.9±0.60µs        ? ?/sec    1.00     22.7±0.12µs        ? ?/sec
gc view types slice half without nulls[100000]    1.03   516.2±19.28µs        ? ?/sec    1.00   503.4±16.55µs        ? ?/sec
gc view types slice half without nulls[8000]      1.02     27.5±0.32µs        ? ?/sec    1.00     27.0±0.25µs        ? ?/sec
gc view types slice half[100000]                  1.00    145.0±3.15µs        ? ?/sec    1.00    144.5±2.16µs        ? ?/sec
gc view types slice half[8000]                    1.02     11.7±1.30µs        ? ?/sec    1.00     11.5±0.16µs        ? ?/sec
view types slice                                  1.01    685.0±3.15ns        ? ?/sec    1.00    681.3±9.27ns        ? ?/sec

@alamb-ghbot
Copy link
Copy Markdown

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-issue-7187 (d058cb4) to f8796fd diff
BENCH_NAME=concatenate_kernels
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernels
BENCH_FILTER=
BENCH_BRANCH_NAME=fix-issue-7187
Results will be posted here when complete

@alamb-ghbot
Copy link
Copy Markdown

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-issue-7187 (d058cb4) to f8796fd diff
BENCH_NAME=view_types
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench view_types
BENCH_FILTER=
BENCH_BRANCH_NAME=fix-issue-7187
Results will be posted here when complete

@alamb-ghbot
Copy link
Copy Markdown

🤖: Benchmark completed

Details

group                                             fix-issue-7187                         main
-----                                             --------------                         ----
gc view types all without nulls[100000]           1.00  1618.2±72.17µs        ? ?/sec    1.00  1624.2±56.81µs        ? ?/sec
gc view types all without nulls[8000]             1.05     68.4±4.53µs        ? ?/sec    1.00     65.4±4.07µs        ? ?/sec
gc view types all[100000]                         1.00    293.3±8.75µs        ? ?/sec    1.00    293.6±6.40µs        ? ?/sec
gc view types all[8000]                           1.00     22.9±0.13µs        ? ?/sec    1.00     22.9±0.80µs        ? ?/sec
gc view types slice half without nulls[100000]    1.04   544.6±14.87µs        ? ?/sec    1.00   522.0±18.94µs        ? ?/sec
gc view types slice half without nulls[8000]      1.00     28.0±0.36µs        ? ?/sec    1.01     28.1±0.75µs        ? ?/sec
gc view types slice half[100000]                  1.00    144.3±2.42µs        ? ?/sec    1.01    145.8±3.99µs        ? ?/sec
gc view types slice half[8000]                    1.00     11.6±0.06µs        ? ?/sec    1.00     11.6±0.35µs        ? ?/sec
view types slice                                  1.01    687.9±5.48ns        ? ?/sec    1.00    680.6±7.58ns        ? ?/sec

@alamb-ghbot
Copy link
Copy Markdown

🤖 ./gh_compare_arrow.sh gh_compare_arrow.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing fix-issue-7187 (d058cb4) to f8796fd diff
BENCH_NAME=concatenate_kernel
BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench concatenate_kernel
BENCH_FILTER=
BENCH_BRANCH_NAME=fix-issue-7187
Results will be posted here when complete

@alamb-ghbot
Copy link
Copy Markdown

🤖: Benchmark completed

Details

group                                                          fix-issue-7187                         main
-----                                                          --------------                         ----
concat 1024 arrays boolean 4                                   1.00     20.6±0.05µs        ? ?/sec    1.01     20.7±0.40µs        ? ?/sec
concat 1024 arrays i32 4                                       1.00     13.4±0.32µs        ? ?/sec    1.03     13.8±0.05µs        ? ?/sec
concat 1024 arrays str 4                                       1.04     37.2±4.12µs        ? ?/sec    1.00     35.7±0.71µs        ? ?/sec
concat boolean 1024                                            1.00    308.1±6.68ns        ? ?/sec    1.10   339.2±10.42ns        ? ?/sec
concat boolean 8192 over 100 arrays                            1.00      5.0±0.04µs        ? ?/sec    1.01      5.1±0.12µs        ? ?/sec
concat boolean nulls 1024                                      1.00   568.9±14.15ns        ? ?/sec    1.01   576.7±11.88ns        ? ?/sec
concat boolean nulls 8192 over 100 arrays                      1.00     18.1±0.12µs        ? ?/sec    1.01     18.2±0.13µs        ? ?/sec
concat fixed size lists                                        1.00   785.0±40.88µs        ? ?/sec    1.05   823.7±45.59µs        ? ?/sec
concat i32 1024                                                1.00    394.4±4.84ns        ? ?/sec    1.02   403.7±19.61ns        ? ?/sec
concat i32 8192 over 100 arrays                                1.00    202.5±6.99µs        ? ?/sec    1.09   221.3±11.69µs        ? ?/sec
concat i32 nulls 1024                                          1.00    606.4±6.04ns        ? ?/sec    1.07   648.7±31.94ns        ? ?/sec
concat i32 nulls 8192 over 100 arrays                          1.00    228.4±6.67µs        ? ?/sec    1.08    246.2±8.27µs        ? ?/sec
concat str 1024                                                1.00     13.1±0.92µs        ? ?/sec    1.08     14.2±1.15µs        ? ?/sec
concat str 8192 over 100 arrays                                1.00    109.4±0.80ms        ? ?/sec    1.00    109.3±1.27ms        ? ?/sec
concat str nulls 1024                                          1.00      5.9±0.48µs        ? ?/sec    1.05      6.1±0.67µs        ? ?/sec
concat str nulls 8192 over 100 arrays                          1.00     54.2±0.56ms        ? ?/sec    1.00     54.2±0.73ms        ? ?/sec
concat str_dict 1024                                           1.04      2.9±0.03µs        ? ?/sec    1.00      2.8±0.02µs        ? ?/sec
concat str_dict_sparse 1024                                    1.00      7.0±0.05µs        ? ?/sec    1.01      7.0±0.14µs        ? ?/sec
concat struct with int32 and dicts size=1024 count=2           1.05      7.2±0.05µs        ? ?/sec    1.00      6.9±0.15µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0               1.00     77.4±1.09µs        ? ?/sec    1.03     79.4±9.59µs        ? ?/sec
concat utf8_view  max_str_len=128 null_density=0.2             1.00     79.7±1.48µs        ? ?/sec    1.00     79.8±1.29µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0                1.00     77.4±1.15µs        ? ?/sec    1.17     90.5±0.69µs        ? ?/sec
concat utf8_view  max_str_len=20 null_density=0.2              1.00     79.1±1.37µs        ? ?/sec    1.17     92.6±0.81µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0      1.00     45.6±3.36µs        ? ?/sec    1.04     47.2±2.50µs        ? ?/sec
concat utf8_view all_inline max_str_len=12 null_density=0.2    1.00     44.7±4.10µs        ? ?/sec    1.13     50.4±3.74µs        ? ?/sec

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Dec 17, 2025

Benchmark results look good to me. Let' go!

@alamb alamb merged commit 116ae12 into apache:main Dec 17, 2025
26 checks passed
@alamb
Copy link
Copy Markdown
Contributor

alamb commented Dec 17, 2025

Thanks again @lichuang @ClSlaid and @klion26

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GenericByteViewBuilder::with_deduplicate_strings Max Length

5 participants