Starting with 0.5, we will follow the following versioning scheme:
- We don't bump MAJOR yet.
- We bump MINOR on breaking changes.
- We increase PATCH otherwise.
- Add tests for all
strfunctions. - Fix tests for
pyarrow=0.17.1and add CI jobs for0.17.1and1.0.1. - Implement a faster take for list arrays.
- Use
utf8_is_*functions from Apache Arrow if available.
- Return correct index in functions like
fr_str.extractall.
- Create a shallow copy on
.astype(equal dtype, copy=True). - Import
pad_1donly in olderpandasversions, otherwise useget_fill_func - Handle
fr_str.extractalland similar functions correctly, returning apd.Dataframecontaining accoringfletcherarray types.
- Use
binary_contains_exactif available frompyarrowinstead of our own numba-based implementation. - Provide two more consistent accessors:
.fr_strx: Call efficient string functions onfletcherarrays, error if not available..fr_str: Call string functions onfletcherandobject-typed arrays, convert toobjectif nofletcherfunction is available.- Add a numba-based implementation for
strip,slice, andreplace. - Support
LargeListArrayas a backing structure for lists. - Implement
isnanufunc.
- Release the GIL where possible.
- Register with dask's
make_array_nonemptyto be able to handle the extension types indask.
- Implement
FletcherBaseArray.__or__andFletcherBaseArray.__any__to supportpandas.Series.replace.
- Forward the
__array__protocol directly to Arrow - Add naive implementation for
zfill - Add efficient (Numba-based) implementations for
endswith,startswithandcontains
- Support roundtrips of
pandas.DataFrameinstances withfletchercolumns throughpyarrowdata structures. - Move CI to Github Actions
Major changes:
- We now provide two different extension array implementations.
There now is the more simpler
FletcherContinuousArraywhich is backed by apyarrow.Arrayinstance and thus is always a continuous memory segments. The initialFletcherArraywhich is backed by apyarrow.ChunkedArrayis now renamed toFletcherChunkedArray. Whilepyarrow.ChunkedArrayallows for more flexibility on how the data is stored, the implementation of algorithms is more complex for it. As this hinders contributions and also the adoption in downstream libraries, we now provide both implementations with an equal level of support. We don't provide the more general named classFletcherArrayanymore as there is not a clear opinion on whether this should point toFletcherContinuousArrayorFletcherChunkedArray. As usage increases, we might provide such an alias class in future again. - Support for
ArithmeticOpsandComparisonOpson numerical data as well as numeric reductions such assum. This should allow the use of nullable int and float type for many use cases. Performance of nullable integeter columns is on the same level as inpandas.IntegerArrayas we have similar implementations of the masked arithmetic. In future versions, we plan to delegate the workload into the C++ code ofpyarrowand expect significant performance improvements though the usage of bitmasks over bytemasks. anyandallare now efficiently implemented on boolean arrays. We blogged about this and how its performance is about twice as fast while only using 1/16 - 1/32 of RAM as the reference boolean array with missing inpandas. This is due to the fact that prior topandas=1.0you have had to use a float array to have a boolean array that can deal with missing values. Inpandas=1.0a newBooleanArrayclass was added that improves this stituation but also change a bit of the logic. We will adapt to this class in the next release and also publish new benchmarks.
New features / performance improvements:
- For
FletcherContinuousArrayin general and allFletcherChunkedArrayinstances with a single chunk, we now provide an efficient implementation oftake. - Support for Python 3.8 and Pandas 1.0
- We now check typing in CI using
mypyand have annotated the code with type hints. We only plan to mark the packages aspy.typedwhenpandasis also marked aspy.typed. - You can query
fletcherfor its version viafletcher.__version__ - Implemented
.str.catas.fr_strx.catfor arrays withpa.string()dtype. uniqueis now supported on all array types wherepyarrowprovides auniqueimplementation.
- Drop Python 2 support
- Support for Python 3.7
- Fixed handling of
datecolumns due to new default behaviours inpyarrow.
Rerelease with the sole purpose of rendering MarkDown on PyPI.
Load the README in setup.py to have a description on PyPI.
Initial release of fletcher that is based on Pandas 0.23.3 and Apache Arrow 0.9. This release already supports any Apache Arrow type but the unit tests are yet limited to string and date.