Skip to content

Commit 7c7eaed

Browse files
ameynertclaude
andcommitted
feat: add create_gnomad_sites_vcf tool
Port create_gnomad_sites_vcf.py from human-diversity-reference/scripts as a defopt-compatible toolkit tool. Exports gnomAD variant sites above a given population frequency threshold as a VCF file with per-population frequency fields in the INFO column. Renames positional args path/vcf_path to sites_table_path/output_vcf_path for clarity. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 3c51270 commit 7c7eaed

File tree

1 file changed

+40
-0
lines changed

1 file changed

+40
-0
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
"""Tool to export gnomAD variants above a frequency threshold as a VCF file."""
2+
3+
import hail as hl
4+
5+
from divref.haplotype import HailPath
6+
7+
8+
def create_gnomad_sites_vcf(
9+
*,
10+
sites_table_path: HailPath,
11+
output_vcf_path: HailPath,
12+
min_popmax: float,
13+
) -> None:
14+
"""
15+
Export gnomAD variant sites above a population frequency threshold as VCF.
16+
17+
Filters the gnomAD variant annotations table to sites where at least one
18+
population's allele frequency meets or exceeds min_popmax, restructures
19+
per-population frequency fields into VCF INFO format, and exports as VCF.
20+
21+
Args:
22+
sites_table_path: Path to the gnomAD variant annotations Hail table
23+
(from extract_gnomad_afs).
24+
output_vcf_path: Output path for the VCF file.
25+
min_popmax: Minimum allele frequency in any population to include a site.
26+
"""
27+
hl.init()
28+
29+
ht = hl.read_table(sites_table_path)
30+
pops: list[str] = ht.pops.collect()[0]
31+
32+
filt = ht.filter(hl.max(ht.pop_freqs[0].map(lambda x: x.AF)) >= min_popmax)
33+
filt = filt.annotate(
34+
info=hl.struct(**{
35+
f"{pop}_{fd}": filt.pop_freqs[i][fd]
36+
for i, pop in enumerate(pops)
37+
for fd in filt.pop_freqs[i]
38+
})
39+
)
40+
hl.export_vcf(filt, output_vcf_path)

0 commit comments

Comments
 (0)