Skip to content

performance when serializing pandas DataFrames #107

@sdementen

Description

@sdementen

the function __get_cell_data (https://github.com/kz26/PyExcelerate/blob/dev/pyexcelerate/Worksheet.py#L227) operates on each cell individually.
when serializing a pandas.DataFrame, most of the time, the columns are of a unique type (dtype) and could benefit from some "columnar" approach (instead of row by row, cell by cell approach) to speed up things:

  • the ´if´ statements could be evaluated only once per column
  • the conversion to string/xml could leverage some "apply / applymap" from pandas
  • ...
    have you already thought about ways to improve this by keeping the "columnar" info further down the pipe (vs transforming everything to cells) for DataFrames ? it is quite specific yet it is a case lot of pandas users are hitting (slowness in exporting to excel).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions