-
Notifications
You must be signed in to change notification settings - Fork 2
Structural variation
Sim-it can simulate 7 types of structural variation (SV) into a fasta file. SVs can be randomly generated or their breakpoints can be defined in an external variant file.
Sim-it supports Deletions, Insertions, Inversions, Duplications, Chromosomal Translocations, Inverted duplications and Complex substitutions.
There are 6 fields in the variant file, the first 4 are essential.
#CHR POS SVLENGTH TYPE VARHAP SEQ
CHR : Chromosome (when the reference fasta file is not GRCh38 or hg19, use the id in the fasta file for each contig)
POS : Start position of the SV.
SVLENGTH : Length of the SV.
TYPE : Type of the SV (DEL, INS, INV, DUP, CSUB, IDUP, TRA).
VARHAP : Genotype of the SV (0/1, 1/0, 1/1)
SEQ : This can be used with INS, DUP and IDUP. The inserted sequence will be from the given coordinates of the reference (chrom,startpos-endpos).
Below you can find an example for each type of SV:
#CHR POS SVLENGTH TYPE VARHAP SEQ 1 1543 345 DEL 1/1
#CHR POS SVLENGTH TYPE VARHAP SEQ 1 52 30 INS 0/1 5,822152-822182
#CHR POS SVLENGTH TYPE VARHAP SEQ 1 52 30 INS 0/1 foreign_sequence1* *Must be the header id of the sequence in the 'foreign sequences' file.
#CHR POS SVLENGTH TYPE VARHAP SEQ 1 78543 45 INV 0/1
#CHR POS SVLENGTH TYPE VARHAP SEQ 1 45200 63x12 DUP 1/1 4,64641508-64641570
#CHR POS SVLENGTH TYPE VARHAP SEQ 1 640638 147 DUP 1/0
#CHR POS SVLENGTH TYPE VARHAP SEQ 1 640638 120 IDUP 1/1 6,81110355-81110475
#CHR POS SVLENGTH TYPE VARHAP SEQ 1:3 . 35482400 TRA 0/1