Skip to content

[FEAT] Reduce analysis overhead by performing only required datacollections #77

@Joseph94m

Description

@Joseph94m

Is your feature request related to a problem? Please describe.

Datacollections perform the API/Graphql calls to Gitlab and prepare the data in a format that controls expect.
When a user uses --controls, --skip-controls or sets enabled: false, the CLI calls all Datacollections even the ones whose outputs are only needed by disabled or skipped controls.

Describe the solution you'd like

As plumber analyze prepares the list of controls that must be run, it must also compile a list data collections necessary for these controls. Only these datacollections must run.

Obviously, if a data collection is needed by multiple controls (as is the case today), it must only be run once.

The declaration must be from controls to datacollections not the other way around. That is, either each control keeps a tab on its required data in its own control file, or we put these declarations in a centralized place and from that plumber deduced the datacollections required. The datacollections themselves should remain unaware of the controls. It's also better if controls remain unaware of the datacollections and focus on the data structures required instead.

Describe alternatives you've considered

Keep as is.

Additional context

The model must be extensible so that when we add new controls or new data collections, it must be easy and obvious how to declare that a control needs a data colelction.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions