vivaxGEN Box (vvg-box) provides a lightweight environment ("box") which is a thin layer of environment variables on top of the Conda ecosystem, managed by micromamba/mamba. Each "box" includes its own instance of micromamba/mamba, independent of the system-wide Conda environment, as well as a separate C/C++ compiler.
vvg-box is used as base installation for most of vivaxGEN software packages and pipelines.
For information about cluster profiles (batch manager/job scheduler), go to the Wiki.
To install, execute the following command:
"${SHELL}" <(curl -L https://raw.githubusercontent.com/vivaxgen/vvg-box/main/install.sh)
Optional enviroment variable that can be supplied for the above commands are:
- MAMBA_ROOT_PREFIX
- PYVER
- uMAMBA_ENVNAME
- BASEDIR
Assuming that the base installation directory is VVGBOX, to open a new shell with active environment (exit by pressing CTRL-D or type exit) use the following command:
VVGBOX/bin/activate
or:
VVGBOX/bin/shell
To activate the environment with current shell or inside a job/bash script):
source VVGBOX/bin/activate
or (pay attention to the dot at the beginning of the line):
. VVGBOX/bin/activate
To install Conda-based software to the active environment, use micromamba command, eg:
micromamba install software_name -c conda-forge -c defaults
To run any installed software in the box without opening new shell nor sourcing the activation script (also for a job script), use the following pattern:
VVGBOX/bin/exec myprog --argument ...
To add additional environment variables (such as adding more $PATH) when the environment is activated, please see the section about etc/bashrc.d below.
The vivaxGEN Box (vvg-box) is designed to provide minimal infrastructure for the following objectives:
- Enable easy installation of a set of software packages, including their binary dependencies, within a single directory
- Allow installation in any directory.
- Require only curl and the bash shell as prerequisites (no additional software installation is needed).
- Support installation and usage by non-privileged users (no root access required).
- Allow other users with access to the installation directory to use the software packages.
- Prevent cluttering the home directory, system directory, or the home directory of the user who initiates the installation.
- Provide mechanisms to activate the environment and switch between environments seamlessly.
- Allow running the installed software directly, without requiring users to activate the environment.
A standalone micromamba is utilized to provide all necessary binary
dependencies.
The standalone micromamba binary will be downloaded and installed in the
installation directory, and all of its configuration and settings will be
stored in the installation directory as well.
There will be no files stored nor any modification in the home directory of
users, apart from pip cache in ~/.cache/pip/ and some added lines in
~/.conda/environments.txt (if the file already exists) of the user who
executes the installation script.
The added lines can be removed manually without affecting the installed system.
The pip cache can be removed with pip cache remove or pip cache purge
if necessary (please consult this
documentation).
The following is base layout of the directories generated by the installation script, assuming that VVG_BASEDIR is the root/base directory of the installation:
VVG_BASEDIR/
bin/
activate
exec
micromamba
shell
opt/
umamba/
apptainer/
envs/
vvg-box/
etc/
bashrc -> ../vvg-box/etc/bashrc
bashrc.d
Information about each file/directory in the base layout is as follow:
bin/activate- This is the main activation/source script that can be executed
to spawn a new shell, or sourced in the current shell or inside a shell
script.
The script basically sets up some environment variables, and then calls
envs/vvg-box/bin/activate. The script itself is generated by executing the generator scriptenvs/vvg-box/bin/generate-activation-script.py. bin/execThis script allows execution of installed software without having to activate the environment. For example, to run an installed program named
myprogwith argument of-h, then use the following command:VVG_BASEDIR/bin/exec myprog -hbin/micromamba- This is the micromamba executable binary, specific for each system/ architecture.
bin/shell- This is just a symlink to
bin/activate opt/umamba/- This directory contains any files related to micromamba, such as environment settings and all binary dependencies files.
opt/apptainer/- This directory contains filesystem images for apptainer/singularity.
envs/- This directory hold repositories cloned from git repositories such as github, including the vvg-box itself. Other repositories (such as various pipelines) need to be cloned here.
envs/vvg-box/- This is the repository of
vvg-boxcloned from github repository. etc/bashrc- This is the main source file, which is needed to be sourced first
before using the installed software (
bin/activatesources this file automatically). This file is normally a symbolic link toenvs/vvg-box/etc/bashrc etc/bashrc.dThis directory contains bash resource files to be sourced in an alphabetical order when
etc/bashrcis being sourced. Software package's specific activation source file should be put (or linked) inside this directory.The activation source file name should be prefixed with 2-digit and dash, as example the source file for vivaxGEN NGS-Pipeline is
10-ngs-pipeline. Other pipelines and software packages relying on vivaxGEN NGS-Pipeline should use number starting from50-. Other global settings that can be modified by users should use number starting from90-, eg. the snakemake job scheduler profile setting is99-snakemake-profiles.
The layout has been designed so that the number of files that are not managed
is very minimal (only bin/activate, bin/micromamba and
filesystem images under opt/apptainer).
Files under opt/umamba is managed by micromamba, while the rest of files
can be symbolic links to any repository in the envs/ directory, which can
be updated by pulling the respective repository.
The vivaxGEN Box utility also provides some command line tools as follows:
export-environment.sh- This script can be used to export the micromamba environment files.
generate-activation-script.py- This script is used to generate
VVG_BASEDIR/bin/activatescript. set-cluster-config.sh- This script will autodetect if there is any batch/job scheduler installed in the system, such as SLURM or PBS, and set the SNAKEMAKE_PROFILE enviroment variable accordingly to the correct profile.
update-pipeline.sh- This script can be executed to update all cloned repository in the
envsdirectory.
After the Box utility environment has been activated, the above commands can be accessed using $VVGBIN environment variable, eg:
$VVGBIN/update-pipeline.sh
The installation script for vivaxGEN Box utility will also install the following software using micromamba with conda-forge channel (optional software will be installed unless the software are already installed in the system):
- git [optional]
- coreutils (for
readlinkandrealpath) [optional] - parallel [optional]
- c compiler suite (c-compiler, usually gcc) [optional]
- c++ compiler suite (cxx-compiler, usually g++) [optional]
- Python (3.12)
- Snakemake (8.x)