Skip to content

Improve RunArray documentation#9019

Merged
Jefffrey merged 3 commits intoapache:mainfrom
Jefffrey:improve-run-array-docs
Dec 21, 2025
Merged

Improve RunArray documentation#9019
Jefffrey merged 3 commits intoapache:mainfrom
Jefffrey:improve-run-array-docs

Conversation

@Jefffrey
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

N/A

Rationale for this change

Whilst reviewing apache/datafusion#18981 I found it very confusing trying to follow along the logic for accounting for slicing of RunArrays. Decided to see if I can improve the documentation around it, especially to make it clear slicing acts logically not physically.

What changes are included in this PR?

Improve docstrings for RunArrays and RunEndBuffers.

Also add some tests for kernel operations on sliced RunArrays which seemed to be missing.

Are these changes tested?

Doc changes only. Added some (expected) failing tests to showcase #9018 too.

Are there any user-facing changes?

Doc changes only.

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 19, 2025
/// scaled well for larger inputs.
///
/// See <https://github.com/apache/arrow-rs/pull/3622#issuecomment-1407753727> for more details.
// TODO: this technically should be a method on RunEndBuffer
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it's worth making an API change solely for this (or we could keep this and make it a thin wrapper)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think a thin wrapper would be good -- as a follow on PR perhaps

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very impressive @Jefffrey -- thank you. I learned a lot here

///
/// Note: any slicing of this [`RunArray`] array is not applied to the returned array
/// and must be handled separately
/// Any slicing of this [`RunArray`] array is **not** applied to the returned
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as how ListArray works, which is definitely tricky to use correctly, as @rluvaton has noted

/// scaled well for larger inputs.
///
/// See <https://github.com/apache/arrow-rs/pull/3622#issuecomment-1407753727> for more details.
// TODO: this technically should be a method on RunEndBuffer
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think a thin wrapper would be good -- as a follow on PR perhaps

Comment thread arrow-buffer/src/buffer/run.rs Outdated
Comment thread arrow-buffer/src/buffer/run.rs
/// └─────────┘
/// logical indices
/// physical logical
/// ┌─────────┬─────────┐ ┌─────────┬─────────┐
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a nice improvement (to show values and indexes)

Comment thread arrow-buffer/src/buffer/run.rs Outdated
run_ends: ScalarBuffer<E>,
len: usize,
offset: usize,
logical_length: usize,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍 -- I like these names much better

Jefffrey and others added 2 commits December 21, 2025 02:50
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@Jefffrey Jefffrey merged commit addf74d into apache:main Dec 21, 2025
25 of 26 checks passed
@Jefffrey Jefffrey deleted the improve-run-array-docs branch December 21, 2025 14:08
@Jefffrey
Copy link
Copy Markdown
Contributor Author

Thanks @alamb

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Dec 21, 2025

Thank YOU -- this is so much better now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants