Skip to content

vivaxgen/vvg-box

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

vivaxGEN Box

vivaxGEN Box (vvg-box) provides a lightweight environment ("box") which is a thin layer of environment variables on top of the Conda ecosystem, managed by micromamba/mamba. Each "box" includes its own instance of micromamba/mamba, independent of the system-wide Conda environment, as well as a separate C/C++ compiler.

vvg-box is used as base installation for most of vivaxGEN software packages and pipelines.

Cluster Profiles

For information about cluster profiles (batch manager/job scheduler), go to the Wiki.

Installation

To install, execute the following command:

"${SHELL}" <(curl -L https://raw.githubusercontent.com/vivaxgen/vvg-box/main/install.sh)

Optional enviroment variable that can be supplied for the above commands are:

  • MAMBA_ROOT_PREFIX
  • PYVER
  • uMAMBA_ENVNAME
  • BASEDIR

Usage

Assuming that the base installation directory is VVGBOX, to open a new shell with active environment (exit by pressing CTRL-D or type exit) use the following command:

VVGBOX/bin/activate

or:

VVGBOX/bin/shell

To activate the environment with current shell or inside a job/bash script):

source VVGBOX/bin/activate

or (pay attention to the dot at the beginning of the line):

. VVGBOX/bin/activate

To install Conda-based software to the active environment, use micromamba command, eg:

micromamba install software_name -c conda-forge -c defaults

To run any installed software in the box without opening new shell nor sourcing the activation script (also for a job script), use the following pattern:

VVGBOX/bin/exec myprog --argument ...

To add additional environment variables (such as adding more $PATH) when the environment is activated, please see the section about etc/bashrc.d below.

Quick Overview

The vivaxGEN Box (vvg-box) is designed to provide minimal infrastructure for the following objectives:

  • Enable easy installation of a set of software packages, including their binary dependencies, within a single directory
  • Allow installation in any directory.
  • Require only curl and the bash shell as prerequisites (no additional software installation is needed).
  • Support installation and usage by non-privileged users (no root access required).
  • Allow other users with access to the installation directory to use the software packages.
  • Prevent cluttering the home directory, system directory, or the home directory of the user who initiates the installation.
  • Provide mechanisms to activate the environment and switch between environments seamlessly.
  • Allow running the installed software directly, without requiring users to activate the environment.

A standalone micromamba is utilized to provide all necessary binary dependencies. The standalone micromamba binary will be downloaded and installed in the installation directory, and all of its configuration and settings will be stored in the installation directory as well. There will be no files stored nor any modification in the home directory of users, apart from pip cache in ~/.cache/pip/ and some added lines in ~/.conda/environments.txt (if the file already exists) of the user who executes the installation script. The added lines can be removed manually without affecting the installed system. The pip cache can be removed with pip cache remove or pip cache purge if necessary (please consult this documentation).

The following is base layout of the directories generated by the installation script, assuming that VVG_BASEDIR is the root/base directory of the installation:

VVG_BASEDIR/
            bin/
                activate
                exec
                micromamba
                shell
            opt/
                umamba/
                apptainer/
            envs/
                 vvg-box/
            etc/
                bashrc -> ../vvg-box/etc/bashrc
                bashrc.d

Information about each file/directory in the base layout is as follow:

bin/activate
This is the main activation/source script that can be executed to spawn a new shell, or sourced in the current shell or inside a shell script. The script basically sets up some environment variables, and then calls envs/vvg-box/bin/activate. The script itself is generated by executing the generator script envs/vvg-box/bin/generate-activation-script.py.
bin/exec

This script allows execution of installed software without having to activate the environment. For example, to run an installed program named myprog with argument of -h, then use the following command:

VVG_BASEDIR/bin/exec myprog -h

bin/micromamba
This is the micromamba executable binary, specific for each system/ architecture.
bin/shell
This is just a symlink to bin/activate
opt/umamba/
This directory contains any files related to micromamba, such as environment settings and all binary dependencies files.
opt/apptainer/
This directory contains filesystem images for apptainer/singularity.
envs/
This directory hold repositories cloned from git repositories such as github, including the vvg-box itself. Other repositories (such as various pipelines) need to be cloned here.
envs/vvg-box/
This is the repository of vvg-box cloned from github repository.
etc/bashrc
This is the main source file, which is needed to be sourced first before using the installed software (bin/activate sources this file automatically). This file is normally a symbolic link to envs/vvg-box/etc/bashrc
etc/bashrc.d

This directory contains bash resource files to be sourced in an alphabetical order when etc/bashrc is being sourced. Software package's specific activation source file should be put (or linked) inside this directory.

The activation source file name should be prefixed with 2-digit and dash, as example the source file for vivaxGEN NGS-Pipeline is 10-ngs-pipeline. Other pipelines and software packages relying on vivaxGEN NGS-Pipeline should use number starting from 50-. Other global settings that can be modified by users should use number starting from 90-, eg. the snakemake job scheduler profile setting is 99-snakemake-profiles.

The layout has been designed so that the number of files that are not managed is very minimal (only bin/activate, bin/micromamba and filesystem images under opt/apptainer). Files under opt/umamba is managed by micromamba, while the rest of files can be symbolic links to any repository in the envs/ directory, which can be updated by pulling the respective repository.

The vivaxGEN Box utility also provides some command line tools as follows:

export-environment.sh
This script can be used to export the micromamba environment files.
generate-activation-script.py
This script is used to generate VVG_BASEDIR/bin/activate script.
set-cluster-config.sh
This script will autodetect if there is any batch/job scheduler installed in the system, such as SLURM or PBS, and set the SNAKEMAKE_PROFILE enviroment variable accordingly to the correct profile.
update-pipeline.sh
This script can be executed to update all cloned repository in the envs directory.

After the Box utility environment has been activated, the above commands can be accessed using $VVGBIN environment variable, eg:

$VVGBIN/update-pipeline.sh

The installation script for vivaxGEN Box utility will also install the following software using micromamba with conda-forge channel (optional software will be installed unless the software are already installed in the system):

  • git [optional]
  • coreutils (for readlink and realpath) [optional]
  • parallel [optional]
  • c compiler suite (c-compiler, usually gcc) [optional]
  • c++ compiler suite (cxx-compiler, usually g++) [optional]
  • Python (3.12)
  • Snakemake (8.x)

About

vivaxGEN Box (vvg-box) thin environment utility

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •