insilicoSV: grammar-based structural variant simulation and placement

Overview

insilicoSV is a versatile framework for structural variant (SV) simulation, which models SVs using a simple and flexible grammar, allowing users to define standard and custom genome rearrangements, as well as encode genome placement constraints.

Key features:

Built-in support for 26 types of structural variants (simple and complex), small indels, and SNPs
Custom SV simulation using grammatical SV notation (e.g. ABC -> aBBBc)
Fine-grained genome placement control allowing SVs (or specific SV breakpoints) to be constrained to specific regions of interest (with multiple placement modes available to specify how the SV should overlap with each region) or to avoid specific regions (i.e. category-specific blacklists)
Integration of user-provided SVs
Fine-grained size simulation allowing independent configuration of inter-breakpoint distances in complex SVs
Modular SV definitions allowing any number of different SV categories to be defined and simulated in the same genome by combining a variety of attributes (e.g, type, size, placement constraints)
Customizable WDL pipeline with support for genome simulation, read simulation, alignment, and visualization

Illustration of SV classes predefined in insilicoSV and their grammatical notation (a), supported SV placement constraints (b), Samplot visualization of short-read alignments at the site of a simulated complex delINVdel event (c), Samplot visualization of short-read alignments at the site of a simulated grammatically-specified custom SV event (d):

Installation

Prerequisite: Python 3.9+ - Install

$> pip install .

User guide

Quick start

To run insilicoSV: $> insilicosv -c <path/to/config.yaml>

Recommended workflows

Create a new directory
Create a new YAML config file in this directory
Populate the YAML config file with the parameters specific to this experiment (see Input guidelines and Use Cases)
Run insilicoSV providing the path to the config file as input. insilicoSV will automatically create output files in the YAML file directory.

Two customizable WDL pipelines are also provided to automatically simulate synthetic genomes and reads and produce alignments for downstream analysis. Each pipeline can be configured to (1) simulate one or multiple genomes, (2) simulate a single or multiple read datasets (currently supported platforms: Illumina, PacBio, and ONT) from these genomes, (4) align the reads, and (5) visualize the alignments at the simulated SV sites. See WDL for more information.

Documentation

For detailed information about insilicoSV features, along with usage examples, please refer to the following documentation sections:

Authors

Nick Jiang - nickj@berkeley.edu

Chris Rohlicek - crohlice@broadinstitute.org

Ilya Shlyakhter - ilya@broadinstitute.org

Enzo Battistella - ebattist@broadinstitute.org

Victoria Popic - vpopic@broadinstitute.org

Name		Name	Last commit message	Last commit date
Latest commit History 672 Commits
docs		docs
insilicosv		insilicosv
tests		tests
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

insilicoSV: grammar-based structural variant simulation and placement

Table of Contents

Overview

Installation

User guide

Quick start

Recommended workflows

Documentation

Authors

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

PopicLab/insilicoSV

Folders and files

Latest commit

History

Repository files navigation

insilicoSV: grammar-based structural variant simulation and placement

Table of Contents

Overview

Installation

User guide

Quick start

Recommended workflows

Documentation

Authors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages