Skip to content

Bug: High resource usage for XLSX with many blank rows #478

@R2ZER0

Description

@R2ZER0

Flatten tool uses a lot of memory and CPU when it's run on an XLSX file containing many blank rows.

To replicate:

  • Follow instructions to download the 360 Giving schema: https://docs.opendataservices.coop/projects/flatten-tool/en/latest/usage-360/
  • Use this example file
    example-grants-with-blank-rows.xlsx
    • This file contains a grant at row 1, and another grant at row 1,000,000 with blank rows in between.
  • nice flatten-tool unflatten --schema 360-giving-schema.json --convert-titles --root-list-path='grants' -f xlsx -o out.json example-grants-with-blank-rows.xlsx
    • probably a good idea to nice it in case it uses all your RAM (NOTE: May use 16GiB or more RAM)
  • Open your favourite resource monitoring tool and watch the memory usage grow

We encountered this issue when a file was uploaded to the 360 Giving DQT containing many blank lines, flatten-tool consumed all the memory and brought the server offline.

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions