Skip to content

Convert Nirvana/Illumina Connected Annotations JSON to VCF format

Notifications You must be signed in to change notification settings

Abrar-Abir/json2vcf

Repository files navigation

json2vcf

Convert Nirvana/Illumina Connected Annotations JSON output to VCF 4.2 format.

Features

  • Pure Python, zero external dependencies
  • Streaming pipeline — processes one position at a time, no full-file load
  • Reads .json and .json.gz input
  • Supports GRCh37 and GRCh38 assemblies (auto-detected from header)
  • Allele normalization — trims shared prefix/suffix to minimal VCF representation (enabled by default)
  • Multi-allelic decomposition — splits multi-allelic sites into biallelic rows, like bcftools norm -m- (--decompose)
  • VEP-style CSQ field with per-transcript annotations
  • Annotations: gnomAD, ClinVar, SpliceAI, REVEL, DANN, GERP, phyloP, 1000 Genomes, TOPMed

Installation

pip install -e .

Usage

# Basic conversion
json2vcf -i input.json.gz -o output.vcf

# VEP-style CSQ only (no flat INFO fields)
json2vcf -i input.json -o output.vcf --csq-only

# Omit sample/genotype columns
json2vcf -i input.json.gz -o output.vcf --no-samples

# Override genome assembly
json2vcf -i input.json.gz -o output.vcf --assembly GRCh37

# Disable allele normalization (keep raw Nirvana alleles)
json2vcf -i input.json.gz -o output.vcf --no-normalize

# Decompose multi-allelic sites into biallelic rows
json2vcf -i input.json.gz -o output.vcf --decompose

# Output to stdout
json2vcf -i input.json.gz

Development

pip install -e ".[dev]"
python3 -m pytest -v

Architecture

Streaming pipeline: parsemapwrite

  • json2vcf/parser.py — Streams Nirvana's line-based JSON format, yielding (NirvanaHeader, Position) tuples
  • json2vcf/mapper.py — Transforms positions into VCF record dicts (per-allele fields, CSQ, INFO escaping)
  • json2vcf/vcf_writer.py — Writes VCF 4.2 plain text
  • json2vcf/models.py — Dataclass contracts between parser and mapper
  • json2vcf/constants.py — VCF header definitions, contig maps, CSQ field names

About

Convert Nirvana/Illumina Connected Annotations JSON to VCF format

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages