-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Here is a summary of a developer meeting at EBRAINS 2.0 Event in Heidelberg, March 2025.
Currently, Cobrawap is a snakemake based workflow. It is based on a Cobrawap yaml config file that includes input parameters as well as (in most cases) a list of blocks to run for each stage.
Currently, there is a development branch that allows an alternative invokation of the cobrawap run commandb based on CWL. This command creates a tool-CWL yaml file for each block based on the Python command line arguments. Next, it creates a workflow-CWL yaml file that chains the individual blocks as indicated in the Cobrawap yaml file, and maps parameters of that yaml file to input parameters for each individual tool (via the tool-CWL yaml descriptions). The workflow-CWL file is then passed to cwltool to execute. This ensures an alternate path to executing the workflow.
We identified the following issues:
Inconsistent parameter names
The mapping of Cobrawap yaml config parameters to command line options of the individual blocks is not standardized. This means that for snakemake, the mapping is done in the individual rules, while in constructing the workflow-CWL yaml file within the cobrawap run command, a complex parsing and mapping of parameters is required. At the same time, the workflow -CWL file cannot be used as config file, since it contains parameters that control the execution flow (e.g., inputs and outputs) that should be hidden from a user.
Solution: refactor the Cobrawap config file to use a hierarchical system of stage/block/parameter name that reflects the parameters of each individual block.
Manual construction of workflow order
Currently, two custom codes govern the creation of the block order in the snakemake rule set and in cobrawap run code, respectively. This may lead to inconsistency, and also it is prone to errors in the configuration that is not reported to the user.
Solution: Identify a common description of block interdependencies that is used by both, the snakemake ruleset and CWL creator in cobrawap run. This requires further discussion on how "rigid" or "flexible" the composition of allowed workflows should be.
Enable execution of cobrawap blocks on the EBRAINS workflow system, and of complete cobrawap-designed workflows
Solution: Here is a suggested way to enable this:
- The generation of tool-CWL files is outsourced to either a separate script, or to a separate
cobrawap generate-cwlcommand - A github action is created that engineers CWL files for each block after each successful pull request to maaster/main. If possible, these CWL files are added to the repository. From then on,
cobrawap run --workflow-engine=cwluses these precompiled tool-CWL by default instead of building them on the fly. - When a new version is released, a separate github action pushes the released CWL files and Python scripts belonging to the blocks to a separate repository for the EBRAINS workflow components (ie, cobrawap blocks). From this, T4.3 develops a way to curate these CWL files further and makes them available to the EBRAINS workflow system. In this way, cobrawap blocks become available (using the ESD as base image).
cobrawap rungets a new parameter--workflow-engine=ebrains, which builds the workflow-CWL file and submits it for execution to the EBRAINS workflow service. This will need to check that cobrawap versions match and then execute the workflow-CWL description using the tool-CWL files registered with the service. Input filename in the config file must be adapted accordingly.
Potential foreseeable problem: Inclusion of custom workflow blocks is difficult -- e.g., for data entry, Therefore, the data entry stage may need to be performed on the local machine.
