Skip to content

Conversion of numpy boolean array to pyarrow boolean array isn't zero-copy #118

@sjperkins

Description

@sjperkins

In both NumPy and CASA, booleans are represented by a byte (8-bits). However, Arrow represents booleans with a single bit.

This means that, in the case of np.bool_, this conversion is not a zero-copy:

# Convert to pyarrow array
pa_array = pa.array(np_array)

This means that it's not possible pass NumPy bool arrays as getcol result arguments, or as putcol data arguments.

It's easy enough to work around this by passing data.astype(np.uint8) in these instances, as arcae represents CASA booleans as arrow:uint8()

template <>
struct CasaDataTypeTraits<casacore::DataType::TpBool> {
using ArrowType = arrow::UInt8Type;
using CasaType = casacore::Bool;
static std::shared_ptr<arrow::DataType> ArrowDataType() { return arrow::uint8(); }
static constexpr bool is_complex = false;
};

but it'd probably be convenient for the user to do this within arcae itself, during the numpy <--> arrow conversion process.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions