Skip to content

Structural variation

Nicolas Dierckxsens edited this page Jun 30, 2022 · 13 revisions

Sim-it can simulate 7 types of structural variation (SV) into a fasta file. SVs can be randomly generated or their breakpoints can be defined in an external variant file.

1. Supported Structural Variation

Sim-it supports Deletions, Insertions, Inversions, Duplications, Chromosomal Translocations, Inverted duplications and Complex substitutions.

2. Variant file input

There are 6 fields in the variant file, the first 4 are essential.

#CHR    POS    SVLENGTH    TYPE    VARHAP    SEQ

CHR               : Chromosome (when the reference fasta file is not GRCh38 or hg19, use the id in the fasta file for each contig)
POS               : Start position of the SV.
SVLENGTH   : Length of the SV.
TYPE             : Type of the SV (DEL, INS, INV, DUP, CSUB, IDUP, TRA).
VARHAP        : Genotype of the SV (0/1, 1/0, 1/1)
SEQ               : This can be used with INS, DUP and IDUP. The inserted sequence will be from the given coordinates of the reference (chrom,startpos-endpos).

Below you can find an example for each type of SV:

2.1. Deletions (DEL)
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1	1543	345	DEL	1/1
2.2.a Insertions (INS)
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1 	52	30	INS	0/1	5,822152-822182
2.2.b Insertions (INS) with foreign sequence
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1 	52	30	INS	0/1	foreign_sequence1*
*Must be the header id of the sequence in the 'foreign sequences' file.
2.3. Inversions (INV)
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1	78543	45	INV	0/1
2.4. Duplications (DUP)
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1	45200	63x12	DUP	1/1	4,64641508-64641570
2.5. Complex substitutions (CSUB)
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1 	640638	147	DUP	1/0
2.6. Inverted duplications (IDUP)
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1	640638	120	IDUP	1/1	6,81110355-81110475
2.7. Chromosomal translocations (TRA)
#CHR	POS	SVLENGTH	TYPE	VARHAP	SEQ
1:3	.	35482400	TRA	0/1