-
Notifications
You must be signed in to change notification settings - Fork 2
Expand file tree
/
Copy pathconfig_Sim-it.txt
More file actions
95 lines (68 loc) · 3.89 KB
/
config_Sim-it.txt
File metadata and controls
95 lines (68 loc) · 3.89 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
Project:
-----------------------------
Project name = Test
Reference sequence = /path/to/reference.fasta
Replace ambiguous nts(N) =
Max threads = 8
Structural variation:
-----------------------------
VCF input =
Foreign sequences =
Deletions = 0
Length (bp) = 30-150000
Insertions = 0
Length (bp) = 30-100000
Tandem duplications = 0
Length (bp) = 50-10000
Copies = 1-20
Inversions = 0
Length (bp) = 150-1300000
Complex substitutions = 0
Length (bp) = 30-100000
Inverted duplications = 0
Length (bp) = 150-350000
Heterozygosity = 60%
Long Read simulation:
-----------------------------
Sequencing depth = 0
Median length = 25000
Length range = 500-150000
Accuracy = 88%
Error profile = error_profile_ONT.txt
Project:
-----------------------------
Project name = Choose a name for your project, it will be used for the output files.
Reference sequence = /path/to/reference.fasta #it can be a gzipped file
Replace ambiguous nts(N) = If the reference contains regions with ambiguous nucleotides, these can be replaced by
random nucleotides to avoid reads consisting out of Ns (yes/no) (Default: no)
Max threads = You can choose the mamximum parallel threads that can be used (Default: no)
Structural variation:
-----------------------------
VCF input = A list of SV positions can be given as input, look at the wiki section for the format of
the VCF input. This option can be combined with random SV inputs
Foreign sequences = If foreign sequences have to be inserted in the reference, they need to listed in a fasta file
Deletions = Deletions count (Default: 0)
Length (bp) = The range for the length of the Deletons (Default: 30-150000)
Insertions = Insertions count (Default: 0)
Length (bp) = The range for the length of the Insertions (Default: 30-100000)
Tandem duplications = Tandem duplications count (Default: 0)
Length (bp) = The range for the length of a single duplication (Default: 50-10000)
Copies = The range of how many times the duplicated sequences is repeated (Default: 1-20)
Inversions = Inversions count (Default: 0)
Length (bp) = The range for the length of the Inversions (Default: 150-1300000)
Complex substitutions = Complex substitutions count (Default: 0)
Length (bp) = The range for the length of the Complex substitutions (Default: 30-100000)
Inverted duplications = Inverted duplications count (Default: 0)
Length (bp) = The range for the length of the Inverted duplications (Default: 150-350000)
Heterozygosity = The percentage of heterozygous SVs (Default: 60%)
Read simulation:
-----------------------------
Sequencing depth = The genome coverage of the long reads (Default: 0).
Besides a fixed value, the path to a sequencing depth profile obtained with samtools can be given.
Median length = The average length of the reads (Default: 25000)
Length range = The range for the lengths of the reads, it will give a normal distribution around the
median length (Default: 500-150000)
Accuracy = The average accuracy of the reads (Default: 88%)
Error profile = This is a file that contains the error profile of the sequencing technology, the github page
provides ONT, PacBio RS2, Sequel II & Sequel CCS/HiFi. These error profiles will provide the
ratios, which will kept when adjusting the average accuracy to your liking