Skip to content

Commit c328aaf

Browse files
committed
Merge branch 'release/v0.9' into develop
2 parents 79859f6 + c34556e commit c328aaf

File tree

2 files changed

+54
-31
lines changed

2 files changed

+54
-31
lines changed

README.md

Lines changed: 53 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -5,21 +5,35 @@ Small library for parsing vcf files. Based on [PyVCF](https://github.com/jamesca
55

66
```python3
77
from vcf_parser import parser
8-
my_parser = parser.VCFParser('infile.vcf')
8+
my_parser = parser.VCFParser(infile='infile.vcf')
99
for variant in my_parser:
1010
print(variant)
1111
```
1212

13-
**vcf_parser also works on streams now.**
13+
**vcf_parser can split multi allelic calls in vcf now.**
1414

1515
Vcf parser is really a lightweight version of [PyVCF](https://github.com/jamescasbon/PyVCF) with most of it's code borrowed and modified from there.
1616

17-
The idea was to make a faster and more flexible tool that mostly work with python dictionarys.
18-
The drawback is inacurracy, while **PyVCF** tests if each row in the vcf is on the correct format vcf_parser is much more sloppier.
17+
The idea was to make a faster and more flexible tool that mostly work with python dictionaries.
18+
It is more inaccurate , while **PyVCF** tests if each row in the vcf is on the correct format, vcf_parser is much more sloppier.
1919

2020
It is easy to access information for each variant, edit the information and edit the headers.
2121

22+
## Basic function ##
23+
24+
2225
Returns dictionary with the vcf info for each variant.
26+
To split the multiallelic calls(and accurate splitting of INFO field including the VEP CSQ fiels) use:
27+
28+
my_parser = parser.VCFParser(infile='infile.vcf', split_variants=True)
29+
30+
The ordinary vcf entrys is stored by there header names, like
31+
32+
variant['CHROM']
33+
variant['ALT']
34+
35+
etc.
36+
2337
The genotype information is converted to a genotype object and stored in a dictionary
2438

2539
variant['genotypes']
@@ -53,7 +67,8 @@ Vep information, if present, is parsed into
5367

5468
and looks like:
5569

56-
'vep_info': {'NOC2L': {'Allele': 'G',
70+
'vep_info': {<alternative_allele>: {
71+
'Allele': 'G',
5772
'Amino_acids': '',
5873
'CDS_position': '',
5974
'Codons': '',
@@ -74,7 +89,8 @@ and looks like:
7489
'SYMBOL': 'NOC2L',
7590
'SYMBOL_SOURCE': '',
7691
'cDNA_position': ''},
77-
'SAMD11': {'Allele': 'G',
92+
<alternative_allele>: {
93+
'Allele': 'G',
7894
'Amino_acids': '',
7995
'CDS_position': '',
8096
'Codons': '',
@@ -94,36 +110,43 @@ and looks like:
94110
'STRAND': '1',
95111
'SYMBOL': 'SAMD11',
96112
'SYMBOL_SOURCE': '',
97-
'cDNA_position': ''}}
113+
'cDNA_position': ''
114+
}
115+
'gene_ids':set([SAMD1, NOC2L])
116+
}
98117

99-
INFO field is parsed into
118+
INFO field is parsed into, where the keys are the names of the info field. Values are lists, if there is no value in the vcf the value in info_dict is False.
100119

101120
variant['info_dict]
102121

103122
and looks like
104123

105-
'info_dict': {'AC': '1',
106-
'AF': '0.167',
107-
'AN': '6',
108-
'BaseQRankSum': '2.286',
109-
'DB': True,
110-
'DP': '1306',
111-
'FS': '1.539',
112-
'InbreedingCoeff': '0.1379',
113-
'MQ': '39.83',
114-
'MQ0': '0',
115-
'MQRankSum': '-2.146',
116-
'POSITIVE_TRAIN_SITE': True,
117-
'QD': '29.57',
118-
'ReadPosRankSum': '0.897',
119-
'VQSLOD': '4.52',
120-
'culprit': 'FS',
121-
'set': 'variant'}
122-
123-
124-
###Print a variant in it´s original format:###
125-
126-
print '\t'.join([[variant[head] for head in my_parser.header])
124+
'info_dict': {'AC': ['1'],
125+
'AF': ['0.167'],
126+
'AN': ['6'],
127+
'BaseQRankSum': ['2.286'],
128+
'DB': False,
129+
'DP': ['1306'],
130+
'FS': ['1.539'],
131+
'InbreedingCoeff': ['0.1379'],
132+
'MQ': ['39.83'],
133+
'MQ0': ['0'],
134+
'MQRankSum': ['-2.146'],
135+
'POSITIVE_TRAIN_SITE': False,
136+
'QD': ['29.57'],
137+
'ReadPosRankSum': ['0.897'],
138+
'VQSLOD': ['4.52'],
139+
'culprit': ['FS'],
140+
'set': ['variant']}
141+
142+
143+
### Print a vcf in it´s original format: ###
144+
145+
my_parser = parser.VCFParser(infile='infile.vcf')
146+
for line in my_parser.metadata.print_header():
147+
print(line)
148+
for variant in my_parser:
149+
print('\t'.join([[variant[head] for head in my_parser.header]))
127150

128151
###Add metadata information:###
129152

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
long_description = 'Tool for parsing Variant Call Format (VCF) files. Works like a lightweight version of PyVCF.'
1313

1414
setup(name='vcf_parser',
15-
version='0.8.3',
15+
version='0.9',
1616
description='Parsing vcf files',
1717
author = 'Mans Magnusson',
1818
author_email = 'mans.magnusson@scilifelab.se',

0 commit comments

Comments
 (0)