Skip to content

Fix performance and metadata representation issues in roledb #1005

@lukpueh

Description

@lukpueh

Update on 2020-04-08:
The particular performance issue described below as Problem 1 has been resolved in #1012 by applying the below proposed Quick fix (a), however, the underlying Problem 2 remains.

I suggest to keep this issue around for context, and proceed with below outlined Long-term fixes (c) and (d).


As reported by @woodruffw (thanks!) and reproduced in the profiling experiment below, repository tool's delegate_hashed_bins method has massive performance issues when delegating to many bins (e.g. 16K), for the following reasons:

Problem 1:
delegate_hashed_bins calls delegate for each new bin. delegate then adds the new delegated role to roledb, but also adds delegation infos to the steadily growing delegating role in roledb in each iteration.

Problem 2:
Operations on roledb are extremely expensive, because they perform two deepcopys of a roleinfo dictionary, like so:

1. roleinfo = roledb.get_roleinfo() # deepcopying roleinfo out of roledb
2. modify roledinfo
3. roledb.update_roleinfo(roleinfo) # deepcopying roleinfo back into roleb

Furthermore, update_roleinfo calls tuf.formats.ROLEDB_SCHEMA.check_match for the passed roleinfo, which recursively iterates over all its elements to check for some criteria.


I propose the following fixes:

  • Quick fix (a):

Change delegate_hashed_bins to update the delegating role only once in the roledb, instead of once for every bin. This means that we can't call delegate as it is right now, but would need to replicate or, ideally, factor out common functionality.

  • Intermediate fix (b):

Stop deepcopying and instead update roledb by reference. The deepcopys are an unnecessary overhead throughout the codebase (see profiling experiment). This seems quite feasible in most cases, but it is also risky. It might be worth skipping the intermediate fix, and move from quick fix directly to the long-term fix.

Also see secondary objective "Has only one internal structure ..." of #846 ")

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions