Skip to content

Size and compression #15

@sidnarayanan

Description

@sidnarayanan

Doing some checking of the size of panda files. Code is at [1] and output at [2]. A few observations:

  • Gen is huge. What can we lose here? Maybe we can have a harder pT cutoff for certain particles like gluons?

  • Gen is in data and triggers are in MC. Should drop these branches altogether when not needed. Will do this when PandaTree is integrated (hard to do right now since these are member variables). Had assumed they would be compressed.

  • puppiAK8 in data is almost as large as puppiCA15 - what is being saved? ECFs are off for this, although the arrays are still saved. Guess compression is not working that well. Should drop the ECF branches when not using them. Should also only save the leading 2 or 3 fatjets. No need for more.

  • Empty vectors are not (poorly?) compressed. Not shown in the pie chart is the 'all' tree, which saves empty vectors for triggers, MET filters, and other junk. I was using PEvent and sort of assuming the unused stuff would be compressed away, but that is not the case. Again, easier to fix with PandaTree because individual branches can be ignored.

[1] https://github.com/sidnarayanan/PandaCore/blob/master/Tools/python/Size.py
[2] http://snarayan.web.cern.ch/snarayan/view.php?dir=pandasize

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions