-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Labels
enhancementNew feature or requestNew feature or requestmediummedium prioritymedium prioritypythonPython related issuesPython related issues
Description
Currently, conversion of Tensors and Complex data in Python is inefficient:
import casa_arrow as ca
casa_table = ca.table("~/data/WSRT_polar.MS_p0/")
arrow_table = casa_table.to_arrow()
print(arrow_table.column("DATA").to_numpy())produces the following output
array([array([array([array([0., 1.], dtype=float32), array([0., 1.], dtype=float32),
array([0., 1.], dtype=float32), array([0., 1.], dtype=float32)],
dtype=object) ,
array([array([0., 1.], dtype=float32), array([0., 1.], dtype=float32),
array([0., 1.], dtype=float32), array([0., 1.], dtype=float32)],
dtype=object) ,
...
array([array([0., 1.], dtype=float32), array([0., 1.], dtype=float32),
array([0., 1.], dtype=float32), array([0., 1.], dtype=float32)],
dtype=object) ],
dtype=object) ],
dtype=object)This is because the extension types are defined in C++, and the to_numpy() method on the default Python Extension Type wrapper isn't overridden. See daskms.experimental.arrow.extension_types.to_numpy for a possible implementation.
Two possible solutions exist
Provide wrappers with richer features within Apache Arrow
The Arrow maintainers are aware of this issue:
And the following exploratory PR's suggest initial solutions:
- GH-33801: [Python] Further expose C++ Extension Types in Python apache/arrow#34469
- GH-33801: [Python] Generate on the fly Python Extension Types wrapping C++ Extension Types apache/arrow#34483
Provide wrappers at the casa-arrow level
Provide a table wrapper that creates numpy arrays directly from the arrow column buffers: e.g.
>>> AT.column("DATA").chunks[0].buffers()
[None, None, None, None, <pyarrow.Buffer address=0x7fca84011000 size=2048 is_cpu=True is_mutable=True>]
>>> bufs=At.column("DATA").chunks[0].buffers()
>>> data = np.frombuffer(bufs[-1], np.complex64)Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestmediummedium prioritymedium prioritypythonPython related issuesPython related issues