-
Notifications
You must be signed in to change notification settings - Fork 12
Output format
SVision uses the standard VCF format to save detected structural variants and an extra rGFA formatted file to save the graph representation of detected complex structural variants.
SVision adopts the standard VCF format with extra info columns. Some important info columns are listed as below:
The SV ID column is given in the format of a_b, where b indicates site a contains other type of SVs.
Filters used in the output.
Covered: The entire SV is spanned by long-reads, producing the most confident calls.
Uncovered: SV is partially spanned by long-reads, i.e. reads spanning one of the breakpoints.
Clustered: SV is partially spanned by long-reads, but can be spanned through reads clusters.
We add extra attributes in the INFO column of VCF format for SVision detected complex structural variants.
BRPKS: The CNN recognized internal structure of CSVs through tMOR.
GraphID: The graph index used to indicate the graph structure, which requires --graph and is obtained by calculating isomorphic graphs.
The ID for simple SVs is -1.
GFA_FILE_PREFIX: File name of CSV corresponding GFA file.
GFA_S: Nodes contained in a CSV graph represented based on GFA format.
GFA_L: Links contained in a CSV graph represented based on GFA format
Example of a SVision CSV call from the demo data
chr9 74283222 4 N <CSV> 0 Covered END=74283473;SVLEN=251;SVTYPE=INS+INV;SUPPORT=12;BKPS=INS:1803-74283224-74283473,INV:157-74283222-74283379;READS=m54329U_190827_173812/67436872/ccs,m54329U_190615_010947/36833505/ccs,m54329U_190701_222759/20252964/ccs,m54329U_190617_231905/171966566/ccs,m54329U_190629_180018/105841132/ccs,m54329U_190701_222759/158597755/ccs,m54329U_190827_173812/141232071/ccs,m54329U_190617_231905/155256326/ccs,m54329U_190629_180018/77990008/ccs,m54329U_190617_231905/67109556/ccs,m54329U_190701_222759/118031725/ccs,m54329U_190701_222759/126223937/ccs;GraphID=0;GFA_FILE_PREFIX=chr9-74283222-74283473-4-INS+INV;GFA_S=S0,S1,S2,S3,I0,I1;GFA_L=S0+I0,I0+S1,S1-I1,I1+S3
Here we listed frequent CSV types detected by SVision. In addition, SVision identifies complex insertions of different structures, containing more than two insertion nodes. These complex insertion events are not included in this table because they are difficult to describe biologically.
| Nodes | Links | Biological description |
|---|---|---|
| S:2,I1,D:1 | S0+I0-,I0-S1+ | Inverted duplicate of a genomic segment representing by the insertion node |
| S:4 | S0+S2-,S2-S3+ | Deletion associated with 3' or 5' inversion |
| S:4,I:1 | S0+S2-,S2-I0+,I0+S3+ | Deletion associated with 5' inversion and insertion |
| S:5 | S0+S2-,S2-S4+ | Two deletions with inverted or non-inverted spacer segment |
| S:3,I:1,D:1 | S0+I0-,I0-S2+ | Deletion associated with insertion, where the inserted sequence is a distal inverted duplicated genomic segment |
| S:3,I:1,D:1 | S0+I0+,I0+S2+ | Deletion associated with insertion, where the inserted sequence is a distal duplicated genomic segment |
| S:2,I:2,D:2 | S0+I0-,I0-I1+,I1+S1+ | A complex insertion consisting of an inverted duplication and a dispersed duplication |
| S:2,I:2,D:1 | S0+I0+,I0+I1-,I1-S1+ | A complex insertion contains a tandem inverted duplication at 3' end |
| S:2,I:2,D:1 | S0+I0-,I0-I1+,I1+S1+ | A complex insertion contains a tandem inverted duplication at 5' end |
SVision classify the graph of each CSV instances by comparing their graph topologies. This requires the --graph and --qname parameter activated.
It will create two text (.txt) file along with the VCF output.
-
sample.graph_exactly_match.txt: Unique graphs for all CSV instances, i.e. isomorphic graphs. -
sample.graph_symmetry_match.txt: Symmetric topology graph classified isomorphic graph.
Examples of two isomorphic graphs, representing different CSV events of the same type.

The below example is an CSV in rGFA format (node sequence is omitted for display purpose), which is detected by SVision at chr11:99,819,283-99,820,576 in HG00733. The graph output is saved in separated files for each CSV events.
S S1 SN:Z:chr11 SO:i:99819338 SR:i:0 LN:i:2990
S I0 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:15813 SR:i:0 LN:i:1113
S I1 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:16927 SR:i:0 LN:i:466
S I2 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:17400 SR:i:0 LN:i:377 DP:S:S1:99820198
S I3 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:17778 SR:i:0 LN:i:838
S I4 SN:Z:m54329U_190827_173812/140708091/ccs SO:i:18617 SR:i:0 LN:i:61 DP:S:S0:99819276
L S0 + I0 + 0M SR:i:0
L I0 + I1 + 0M SR:i:0
L I1 + I2 - 0M SR:i:0
L I2 - I3 + 0M SR:i:0
L I3 + I4 + 0M SR:i:0
L I4 + S1 + 0M SR:i:0
Besides the information included in standard rGFA format,
we add another DP:S column to indicate sequence with detected origins via local realignment,
such as node I2 is duplicated from node S1.
Note: This is a post-processing step that tries to validate the detected CSVs.
Step1: Extract HiFi raw reads
samtools view -b HG00733.ngmlr.sorted.bam chr11:99810000-99830000 > tmp.bam
samtools fasta tmp.bam > tmp.fasta
Step2: Align with GraphAligner
Please check GraphAligner for the detailed usage.
GraphAligner -g chr11-99819283-99820576.gfa -f tmp.fasta -a aln.gaf -x vg
Example of CSV path supporting reads
m54329U_190827_173812/140708091/ccs 21668 0 21668 + >S0>I0>I1<I2>I3>I4>S1
m54329U_190617_231905/88145984/ccs 13612 0 13612 + >S0>I0>I1<I2>I3>I4>S1
m54329U_190617_231905/88145984/ccs 13612 0 13612 + >S0>I0>I1<I2>I3>I4>S1