Sim-it/config_Sim-it.txt at master · ndierckx/Sim-it · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
Project:
-----------------------------
Project name             = Test
Reference sequence       = /path/to/reference.fasta
Replace ambiguous nts(N) =
Max threads              = 8


Structural variation:
-----------------------------
VCF input                =
Foreign sequences        =

Deletions                = 0
Length (bp)              = 30-150000

Insertions               = 0
Length (bp)              = 30-100000

Tandem duplications      = 0
Length (bp)              = 50-10000
Copies                   = 1-20

Inversions               = 0
Length (bp)              = 150-1300000

Complex substitutions    = 0
Length (bp)              = 30-100000

Inverted duplications    = 0
Length (bp)              = 150-350000

Heterozygosity           = 60%


Long Read simulation:
-----------------------------
Sequencing depth         = 0
Median length            = 25000
Length range             = 500-150000
Accuracy                 = 88%
Error profile            = error_profile_ONT.txt


Project:
-----------------------------
Project name             = Choose a name for your project, it will be used for the output files.
Reference sequence       = /path/to/reference.fasta  #it can be a gzipped file
Replace ambiguous nts(N) = If the reference contains regions with ambiguous nucleotides, these can be replaced by
                           random nucleotides to avoid reads consisting out of Ns (yes/no) (Default: no)
Max threads              = You can choose the mamximum parallel threads that can be used (Default: no)


Structural variation:
-----------------------------
VCF input                = A list of SV positions can be given as input, look at the wiki section for the format of
                           the VCF input. This option can be combined with random SV inputs
Foreign sequences        = If foreign sequences have to be inserted in the reference, they need to listed in a fasta file

Deletions                = Deletions count (Default: 0)
Length (bp)              = The range for the length of the Deletons (Default: 30-150000)

Insertions               = Insertions count (Default: 0)
Length (bp)              = The range for the length of the Insertions (Default: 30-100000)

Tandem duplications      = Tandem duplications count (Default: 0)
Length (bp)              = The range for the length of a single duplication (Default: 50-10000)
Copies                   = The range of how many times the duplicated sequences is repeated (Default: 1-20)

Inversions               = Inversions count (Default: 0)
Length (bp)              = The range for the length of the Inversions (Default: 150-1300000)

Complex substitutions    = Complex substitutions count (Default: 0)
Length (bp)              = The range for the length of the Complex substitutions (Default: 30-100000)

Inverted duplications    = Inverted duplications count (Default: 0)
Length (bp)              = The range for the length of the Inverted duplications (Default: 150-350000)

Heterozygosity           = The percentage of heterozygous SVs (Default: 60%)


Read simulation:
-----------------------------
Sequencing depth         = The genome coverage of the long reads (Default: 0).
                           Besides a fixed value, the path to a sequencing depth profile obtained with samtools can be given.
Median length            = The average length of the reads (Default: 25000)
Length range             = The range for the lengths of the reads, it will give a normal distribution around the
                           median length (Default: 500-150000)
Accuracy                 = The average accuracy of the reads (Default: 88%)
Error profile            = This is a file that contains the error profile of the sequencing technology, the github page
                           provides ONT, PacBio RS2, Sequel II & Sequel CCS/HiFi. These error profiles will provide the
                           ratios, which will kept when adjusting the average accuracy to your liking