Overview
Installation
User Guide
Quick start
Recommended workflows
Detailed documentation
insilicoSV is a versatile framework for structural variant (SV) simulation,
which models SVs using a simple and flexible grammar, allowing users to define standard and custom genome
rearrangements, as well as encode genome placement constraints.
Key features:
- Built-in support for 26 types of structural variants (simple and complex), small indels, and SNPs
- Custom SV simulation using grammatical SV notation (e.g.
ABC -> aBBBc) - Fine-grained genome placement control allowing SVs (or specific SV breakpoints) to be constrained to specific regions of interest (with multiple placement modes available to specify how the SV should overlap with each region) or to avoid specific regions (i.e. category-specific blacklists)
- Integration of user-provided SVs
- Fine-grained size simulation allowing independent configuration of inter-breakpoint distances in complex SVs
- Modular SV definitions allowing any number of different SV categories to be defined and simulated in the same genome by combining a variety of attributes (e.g, type, size, placement constraints)
- Customizable WDL pipeline with support for genome simulation, read simulation, alignment, and visualization
Illustration of SV classes predefined in insilicoSV and their grammatical notation (a),
supported SV placement constraints (b), Samplot visualization of short-read alignments at the site of a simulated
complex delINVdel event (c), Samplot visualization of short-read alignments at the site of a simulated
grammatically-specified custom SV event (d):
Prerequisite: Python 3.9+ - Install
$> pip install .
To run insilicoSV: $> insilicosv -c <path/to/config.yaml>
- Create a new directory
- Create a new YAML config file in this directory
- Populate the YAML config file with the parameters specific to this experiment (see Input guidelines and Use Cases)
- Run
insilicoSVproviding the path to the config file as input.insilicoSVwill automatically create output files in the YAML file directory.
Two customizable WDL pipelines are also provided to automatically simulate synthetic genomes and reads and produce alignments for downstream analysis. Each pipeline can be configured to (1) simulate one or multiple genomes, (2) simulate a single or multiple read datasets (currently supported platforms: Illumina, PacBio, and ONT) from these genomes, (4) align the reads, and (5) visualize the alignments at the simulated SV sites. See WDL for more information.
For detailed information about insilicoSV features, along with usage examples,
please refer to the following documentation sections:
Nick Jiang - nickj@berkeley.edu
Chris Rohlicek - crohlice@broadinstitute.org
Ilya Shlyakhter - ilya@broadinstitute.org
Enzo Battistella - ebattist@broadinstitute.org
Victoria Popic - vpopic@broadinstitute.org
