Skip to content

[Epic] Replace ArrayData with direct Array construction, when possible #9298

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

While we work on micro micro optimizations, we have seen a common pattern where older parts of the arrow-rs codebase use ArrayData to create new arrays.

An ArrayData has at least one extra allocation (for the Vec that holds Buffers) as well as a bunch of dynamic function calls. While this overhead is small individually, it is paid for every array so in aggregate it can be substantial

It also typically requires an unsafe call which is unnecessary as the new APIs can be checked by the compiler.

Quoting @tustvold

My 2 cents is it would be better to move the codepaths relying on ArrayData over to using the typed arrays directly, this should not only cut down on allocations but unnecessary validation and dispatch overheads.

Describe the solution you'd like
Change relying on ArrayData over to creating the typed arrays directly, this should not only cut down on allocations but unnecessary validation and dispatch overheads.

Describe alternatives you've considered
Here are some example PRs

the old, less efficient pattern looks like this (note the vec![buffer] to create a buffer).

        let data = unsafe {
            ArrayData::new_unchecked(T::DATA_TYPE, len, None, Some(null), 0, vec![buffer], vec![])
        };
        PrimitiveArray::from(data)

or

        let array_data = ArrayDataBuilder::new(arrow_data_type)
            .len(self.record_reader.num_values())
            .add_buffer(record_data)
            .null_bit_buffer(self.record_reader.consume_bitmap_buffer());

        let array_data = unsafe { array_data.build_unchecked() };

The new pattern looks like this (note no unsafe or allocations)

        // Create nulls directly (note the `filter` to avoid nulls)
        let nulls =
            Some(NullBuffer::new(BooleanBuffer::new(null, 0, len))).filter(|n| n.null_count() > 0);
        // Create Primitive Array directly
        PrimitiveArray::new(ScalarBuffer::from(buffer), nulls)

** Note the only tricky thing I have seen is that ArrayDataBuilder automatically checks / drops NullBuffers that have no nulls. When updating the code we need to follow a similar pattern

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelogperformance

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions