Amaranth is a reference-based transcript assembler specifically optimized for single-cell RNA-seq data. The development of Amaranth has been based on the Scallop2/Scallop assembler series.
Amaranth is available in bioconda channel. It is most recommended to install amaranth by using Pixi or Mamba. It can also be installed by Conda, but conda may be much slower in solving the environment, especially when user’s conda version is not up-to-date or when trying to install in a populated environment. Amaranth supports Linux and macOS. Windows users please consider using wsl or other Unix-like system.
Pixi is a new fast, modern, and reproducible package management tool and works cross-programming languages, including Python, C++ and Rust. It can install software from conda channels. It is the most recommended package management tool to install Amaranth.
If not yet, users need to first install Pixi:
# install pixi
curl -fsSL https://pixi.sh/install.sh | shYou may need to restart your terminal or run source ~/.bashrc (Linux) / source ~/.zshrc (macOS) to make the change effective.
If the installation is successful, you can type
pixiin the terminal and pixi's help message should be printed.
Pixi organizes software and dependencies in projects (similar to virtual environments). To start, you need to initiate a pixi project in a directory using pixi init <project_dir>. This will create a pixi.toml config file and a hidden .pixi directory in the <project_dir>. You will also need to specify conda-forge and bioconda channels. Pixi will search for software from those channels.
# Initiate pixi project in a directory (anywhere, e.g., your current dir)
pixi init . --channel conda-forge --channel biocondaOnce inside your <project_dir>, use the following command to install amaranth:
# to install
pixi add bioconda::amaranth-assemblerSuccessful installation will print message like “Added bioconda::amaranth-assembler”.
After installation,
amaranthis installed in.pixisubdirectory, managed by Pixi. You won't see a new file in your current directory. To use, runpixi run amaranth <arguments>.
After installation, users can use pixi run <tool_name> <tool_arguments> to use the software. For example, the following command will print help messages.
# To test installation, you can try print the help message
pixi run amaranth --helpYou can also download example data from our GitHub repo (example data) and test amaranth. After downloading, place the example file in your project directory:
# download example data
wget https://github.com/Shao-Group/amaranth/releases/download/v0.1.0/example-input.bam
# to assemble example-input.bam and produce test_output.gtf file
pixi run amaranth -i example-input.bam -o test_output.gtfNote: pixi run commands must be executed from within the project directory.
To directly use tools without pixi run, users can invoke pixi shell, similar to conda activate env_name. Then users can use amaranth anywhere, even outside the <project_dir>.
# activate pixi environment
pixi shell
# users can directly run tools without `pixi run`
amaranth -i example-input.bam -o test_output.gtf
pixi run <tool>only works inside its corresponding pixi project_dir. After invokingpixi shell(from project_dir), users can use the software anywhere, including outside the project_dir.To leave pixi shell, run
exit.
Mamba is a reimplementation of the conda package manager in C++. It is fully compatible with conda, and much faster. Micromamba is a tiny version of the mamba package manager. In this installation guide, we use micromamba as an example. If you already have conda or mamba installed, you can skip the installation step and replace micromamba with conda or mamba in the commands below. However, conda could be noticeably slower than mamba and micromamba, especially on older versions. If using conda, we recommend at least Version 24 or newer.
If not yet, users need to first install Micromamba:
# install Micromamba
"${SHELL}" <(curl -L micro.mamba.pm/install.sh)You may need to restart your terminal or run source ~/.bashrc (Linux) / source ~/.zshrc (macOS) to make the change effective.
If the installation is successful, you can type
micromambain the terminal and micromamba’s help message should be printed.
(Optional) It is recommended to create a new environment, to minimize conflicts and speed-up installation. You can also activate any existing environment that you want to install in.
# Optionally, create and activate a new environment
micromamba create -n amaranth_env -c conda-forge -c bioconda
micromamba activate amaranth_envThe following micromamba command will install amaranth:
# to install amaranth
micromamba install -c conda-forge bioconda::amaranth-assemblerSuccessful installation will print message like "Executing transaction: done" or "Transaction finished". If Micromamba takes a long time to resolve the environment (e.g., 15+ minutes), consider using a fresh new environment.
After installation,
amaranthis available as a command in your terminal (i.e. from$PATH), but you won't see a new file in your current directory. The binary is installed inside the micromamba environment folder. Usewhich amaranthto see the exact location.
After installation, users can directly call amaranth to use the software. For example, to print help message or test on a small example data (download data from our GitHub repo):
# To test installation, you can try print the help message
amaranth --help
# or download and test on an example dataset
wget https://github.com/Shao-Group/amaranth/releases/download/v0.1.0/example-input.bam
amaranth -i example-input.bam -o test_output.gtfIf you created a new environment, remember to run
micromamba activate amaranth_enveach time you open a new terminal before using amaranth.To deactivate the environment, run
micromamba deactivate.
Alternatively, amaranth is also available in docker. Please ensure Docker daemon is running in the background before using docker. Root privilege may be required to run Docker daemon.
To pull (download) the image, users need to replace the <tag> with appropriate value from biocontainer (biocontainers / amaranth-assembler).
# replace the <tag> with valid value
docker pull quay.io/biocontainers/amaranth-assembler:<tag>For example, the docker container tag for v0.1.0:
# actual tag for v0.1.0
docker pull quay.io/biocontainers/amaranth-assembler:0.1.0--h5ca1c30_0To explore the container interactively and see what's inside:
docker run -it quay.io/biocontainers/amaranth-assembler:0.1.0--h5ca1c30_0 /bin/bash
# you can try to explore amaranth in the container, for example, print help message
amaranth --helpTo run amaranth with your local files mounted (non-interactive):
docker run -v $(pwd):/data quay.io/biocontainers/amaranth-assembler:0.1.0--h5ca1c30_0 \
amaranth -i /data/<your_input.bam> -o /data/<output_prefix>The -v $(pwd):/data mounts your current directory to /data inside the container so you can access your files. That’s why /data/ is also added before both input and output files’ local paths. Otherwise the input/ output files won’t be accessible to Docker.
For other installation methods, such as installing from source code, please read INSTALL.md. If you have questions on installation, please check Installation FAQ or open an issue on github.
Assuming amaranth is available from command line (which may vary depending on the installation method), the general usage is:
amaranth -i <input.bam> -o <output>
Users may need to replace
amaranthwith appropriate command with respect to their installation methods:
- Pixi: go to corresponding
<project_dir>and (1) runpixi run amaranth -i <input.bam> -o <output>or (2) runpixi shelland then runamaranth -i <input.bam> -o <output>.- Mamba/conda: please make sure the correct environment is activated before calling
amaranth -i <input.bam> -o <output>.- Docker:
docker run -v $(pwd):/data quay.io/biocontainers/amaranth-assembler:<tag> amaranth -i /data/<input.bam> -o /data/<output>. Remember to replace<tag>with the actual tag value.- Compiled from source, please use the path to amaranth executable binary. It is usually
./src/amaranth.
The input.bam is a read alignment file generated by an RNA-seq aligner.
To correctly assemble single-cell RNA-seq reads (e.g. Smart-seq3), the bam file should have SAM optional field tag BC which stores cell barcodes and tag UB which stores UMI barcodes. See bam data in example directory as an example.
If you want to assemble each single cell independently, run one command for each cell. If you want to perform meta-assembly (which leverages information across all cells to improve individual cell assemblies), see Meta-assembly Usage.
Make sure that the bam file is sorted; otherwise run samtools to sort it:
samtools sort input.bam > input.sort.bam
The reconstructed transcripts shall be written as gtf format into output.gtf.
The usage of amaranth for meta-assembly is:
amaranth --meta -i <merged.bam> -o <output>The argument --meta must be supplied to perform meta-assembly.
The merged.bam is the sorted and merged read alignment file of all cells. The bam file should have SAM optional field tag BC which stores cell barcodes and tag UB which stores UMI barcodes.
If user has separate sorted read alignment files of each single cell, use samtools to merge them.
# merge n bam files with 32 threads
samtools merge -@32 -o merged.bam 1.bam 2.bam ... n.bamThe reconstructed transcripts for each cell will be written as gtf format into output.<BC>.gtf, where <BC> is the cell barcode (sam tag BC) of each cell. The meta-assembly (union of transcripts from all cells) will be written into output.meta.gtf.
To achieve the best performance, it is recommended to union a cell's transcripts from both amaranth's meta-assembly run and individual (as a single cell) assembly run. Union of transcripts can be done using tools such as TACO or gtfmerge.
Here is a list of supported parameters. Please refer to additional explanations below the table.
| Parameters | Default Value | Description |
|---|---|---|
| --help | print usage of amaranth and exit | |
| --version | print version of amaranth and exit | |
| --meta | not used | to perform meta-assembly. |
| --preview | show the inferred library_type and exit |
|
| --verbose | 1 | chosen from {0, 1, 2} |
| -f/--transcript_fragments | file to which the assembled non-full-length transcripts will be written to | |
| --library_type | empty | chosen from {empty, unstranded, first, second}; If empty, Amaranth will try to infer automatically. |
| --assemble_duplicates | 10 | the number of consensus runs of the decomposition |
| --min_transcript_coverage | 1.5 | the minimum coverage required to output a multi-exon transcript |
| --min_single_exon_coverage | 20 | the minimum coverage required to output a single-exon transcript |
| --min_transcript_length_base | 150 | the minimum base length of a transcript |
| --min_transcript_length_increase | 50 | the minimum increased length of a transcript with each additional exon |
| --min_mapping_quality | 1 | ignore reads with mapping quality less than this value |
| --max_num_cigar | 1000 | ignore reads with CIGAR size larger than this value |
| --min_bundle_gap | 100 | the minimum distances required to start a new bundle |
| --min_num_hits_in_bundle | 5 | the minimum number of reads required in a bundle |
| --min_flank_length | 3 | the minimum match length required in each side for a spliced read |
| --min_splice_boundary_hits | 1 | the minimum number of spliced reads required to support a junction |
| --min-umi-reads-bundle | 1 | (int) Bundle with less UMI reads than this threshold will be ignored |
| --min-umi-ratio-bundle | 0.0 | (float) Bundle with lower UMI reads ratio than this threshold will be ignored |
| --both-umi-support | not used | If set a bundle need to satisfy both min-umi-reads-bundle and min-umi-ratio-bundle. Otherwise, satisfy either of them is ok |
| --min-umi-reads-start-exon | 1 | (int) minimum number of UMI reads support of the first exon in a valid transcript |
| --remove-retained-intron | used | |
| --no-remove-retained-intron | not used | |
| --max-ir-part-ratio-v | 0.5 | the ratio threshold of retained node to skip edge for partial introns (if greater than threshold, consider true transcript) |
| --max-ir-part-ratio-e | 0.5 | the ratio threshold of retained node's edge to skip edge for partial introns (if greater than threshold, consider true transcript) |
| --max-ir-full-ratio-v | 1.0 | the ratio threshold of retained node to skip edge for full introns (if greater than threshold, consider true transcript) |
| --max-ir-full-ratio-e | 0.5 | the ratio threshold of retained node's edge to skip edge for full introns (if greater than threshold, consider true transcript) |
| --max-ir-full-ratio-i | 10.0 | the ratio threshold of retained node to its own edge for full introns (if greater than threshold, consider true RETENTION) |
| --max-ir-umi-support-full | 3 | (int) maximum number of UMI reads to support a partial exon rather than full intron retention |
| --max-ir-umi-support-part | 5 | (int) maximum number of UMI reads to support a partial exon rather than partial intron retention |
| --remove-pcr-duplicates | 1 | 0 (not remove) or 1 (remove w.r.t alignment coordinates and CIGAR string) |
| --min-cb-ratio | 0.3 | (float) for meta-assembly, minimum ratio of exons in a transcript supported by a cell's barcode |
- For
--verbose, 0: quiet; 1: one line for each splice graph; 2: details of graph decomposition. --min_transcript_coverageis used to filter lowly expressed transcripts: amaranth will filter out transcripts whose (predicted) raw counts (number of moleculars) is less than this number.--min_transcript_length_baseand--min_transcript_length_increaseis combined to filter short transcripts: the minimum length of a transcript is given by--min_transcript_length_base+--min_transcript_length_increase* num-of-exons-in-this-transcript. Transcripts that are less than this number will be filtered out.
Example test data is provided in example/. The example data is also available in Release.
Users can also use the following link to download from GitHub Release:
wget https://github.com/Shao-Group/amaranth/releases/download/v0.1.0/example-input.bam
‼️ Note thatwgetworks only with GitHub Release, but it does NOT work with GitHub repo files. It will be truncated if you use wget to download GitHub files directly from a repo,
You can use sha256 to test the integrity of downloaded data.
sha256 example-input.bam # should be `59e036720e6539336600409bb7a466dd82a51e8ad98a30016951aea42f21fba6`Users can use the following command to do a basic example run (assuming you are in the same directory as example-input.bam)
amaranth -i example-input.bam -o test_outputThe assembled transcripts will be in test_output.gtf.
Please raise an issue on GitHub.
