Skip to content

A new Fonda scripting/launching approach #162

@kamyshova

Description

@kamyshova

Motivation

There are two crucial drawbacks in the current Fonda implementation:

  • deadlocks
  • idle resources

Both problems are related to the implementation of the scripts launching. There are two types of scripts execution:

  1. The first scripts part (e.g. alignment/post alignment scripts) is launched Fonda at the same time.
  2. The second scripts part (e.g. featureCount, cufflinks) is launched from the alignment/post alignment scripts.

The work coordination of all scripts is carried out by checking the log files that scripts produce. But log file can not be created at all if the script was not been invoked (e.g. script is launched from the alignment scripts). In this case, post-process scripts will work forever.

image

Picture 1. The current Fonda launching approach

For example, post-process scripts (qcsummary.sh, cufflinks_cohort.sh - see Picture 1) are launched with
alignment scripts simultaneously. Post-process cufflinks_cohort.sh scripts expect the result of the cufflinks.sh
script execution by check the cufflinks.log file. But alignment script can fail before cufflinks.sh script invocation.
But cufflinks_cohort.sh will not know about it and will run infinitely.

Deadlocks are specific for launch in the SGE cluster. Each script is a SGE cluster submitted job.
The job has specific resource requirements - the number of slots defined by the user in the Fonda global config file
(NUMTHREADS parameter in Queue_Parameters section). The number of slots is equal to the number of processors in a cluster.
The user can set such a number of slots that the cluster size will not be enough for job work. In this case, the job hangs on in a pending state (qw).

For example, the cluster size is 8 CPU. A user sets NUMTHREADS=4. First of all Fonda launches 3 scripts - alignment.sh, qcsummary.sh, cufflinks_cohort.sh. 2 of them (alignment.sh, qcsummary.sh) will be in running status. cufflinks_cohort.sh job is in the qw state which stands for being queued and waiting. In its turn, alignment.sh script invokes cufflinks.sh and featureCount.sh and waits for the results. But the cluster doesn't have available slots. cufflinks.sh and featureCount.sh hang on in a pending state, and alignment.sh job will wait for their result endlessly.

Thus, in the beginning post-process jobs take up resources without performing useful work. On the contrary, idle resources are possible in the case of the autoscale cluster.

Approach

We propose a new approach to scripts launching.

image

Picture 2. The new proposed approach

As can be seen at the picture above, we create an additional master.sh script-orchestrator to manage all scripts.
Fonda will only run master.sh script directly. Initially, the master script starts all alignment.sh scripts and waits for their results. After successful completion of alignment step the cufflinks.sh, featureCount.sh etc scripts are launched if they are needed.
Please note that we intend to remove the launch of the script from the alignment/post alignment scripts.
After the per samples scripts are executed successfully, master.sh script launches the post-process scripts.

To sum up proposed changes:

  • create a new master.sh script to manage all scripts
  • remove launching of scripts from the alignment/post alignment scripts
  • sequential launching of pipeline stages

This approach proposes getting rid of the above problems and makes the process of launching scripts more transparent.
At the same time, this approach preserves the parallelization of processes where it is possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions