Convert Nirvana/Illumina Connected Annotations JSON output to VCF 4.2 format.
- Pure Python, zero external dependencies
- Streaming pipeline — processes one position at a time, no full-file load
- Reads
.jsonand.json.gzinput - Supports GRCh37 and GRCh38 assemblies (auto-detected from header)
- Allele normalization — trims shared prefix/suffix to minimal VCF representation (enabled by default)
- Multi-allelic decomposition — splits multi-allelic sites into biallelic rows, like
bcftools norm -m-(--decompose) - VEP-style CSQ field with per-transcript annotations
- Annotations: gnomAD, ClinVar, SpliceAI, REVEL, DANN, GERP, phyloP, 1000 Genomes, TOPMed
pip install -e .# Basic conversion
json2vcf -i input.json.gz -o output.vcf
# VEP-style CSQ only (no flat INFO fields)
json2vcf -i input.json -o output.vcf --csq-only
# Omit sample/genotype columns
json2vcf -i input.json.gz -o output.vcf --no-samples
# Override genome assembly
json2vcf -i input.json.gz -o output.vcf --assembly GRCh37
# Disable allele normalization (keep raw Nirvana alleles)
json2vcf -i input.json.gz -o output.vcf --no-normalize
# Decompose multi-allelic sites into biallelic rows
json2vcf -i input.json.gz -o output.vcf --decompose
# Output to stdout
json2vcf -i input.json.gzpip install -e ".[dev]"
python3 -m pytest -vStreaming pipeline: parse → map → write
json2vcf/parser.py— Streams Nirvana's line-based JSON format, yielding(NirvanaHeader, Position)tuplesjson2vcf/mapper.py— Transforms positions into VCF record dicts (per-allele fields, CSQ, INFO escaping)json2vcf/vcf_writer.py— Writes VCF 4.2 plain textjson2vcf/models.py— Dataclass contracts between parser and mapperjson2vcf/constants.py— VCF header definitions, contig maps, CSQ field names