-
Notifications
You must be signed in to change notification settings - Fork 181
Open
Labels
Description
I am using the call command to attempt to generate BAF and allele-specific copy numbers and was running into the issue of negative BAF values described in #601. Following the guidance there, I used the call command and specified the tumor and normal samples from a strelka VCF and got the following error:
Selected test sample TUMOR and control sample NORMAL
Skipping NC_072790.1:221367 G @ TUMOR; 'invalid FORMAT: GT'
Traceback (most recent call last):
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/bin/cnvkit.py", line 10, in <module>
sys.exit(main())
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/cnvkit.py", line 10, in main
args.func(args)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/commands.py", line 1178, in _cmd_call
varr = load_het_snps(
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/cnvlib/cmdutil.py", line 30, in load_het_snps
varr = tabio.read(
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/__init__.py", line 75, in read
dframe = reader(infile, **kwargs)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 62, in read_vcf
table = pd.DataFrame.from_records(rows, columns=columns)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/pandas/core/frame.py", line 2450, in from_records
first_row = next(data)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 233, in _parse_records
depth, zygosity, alt_count = _extract_genotype(sample, record)
File "/Users/brandonhastings/opt/miniconda3/envs/cnvkit/lib/python3.10/site-packages/skgenome/tabio/vcfio.py", line 303, in _extract_genotype
gts = set(sample["GT"])
File "pysam/libcbcf.pyx", line 3541, in pysam.libcbcf.VariantRecordSample.__getitem__
File "pysam/libcbcf.pyx", line 813, in pysam.libcbcf.bcf_format_get_value
KeyError: 'invalid FORMAT: GT'
After examining the strelka VCF file, it appears that the GT field is not present (which appears to be deliberate by strelka Illumina/strelka#16). I have pasted the header of my VCF here with the available fields along with the first line. Could support for strelka be added?
##FILTER=<ID=LowDepth,Description="Tumor or normal sample read depth at this locus is below 2">
##FILTER=<ID=LowEVS,Description="Somatic Empirical Variant Score (SomaticEVS) is below threshold">
##FORMAT=<ID=AU,Number=2,Type=Integer,Description="Number of 'A' alleles used in tiers 1,2">
##FORMAT=<ID=CU,Number=2,Type=Integer,Description="Number of 'C' alleles used in tiers 1,2">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read depth for tier1 (used+filtered)">
##FORMAT=<ID=FDP,Number=1,Type=Integer,Description="Number of basecalls filtered from original read depth for tier1">
##FORMAT=<ID=GU,Number=2,Type=Integer,Description="Number of 'G' alleles used in tiers 1,2">
##FORMAT=<ID=SDP,Number=1,Type=Integer,Description="Number of reads with deletions spanning this site at tier1">
##FORMAT=<ID=SUBDP,Number=1,Type=Integer,Description="Number of reads below tier1 mapping quality threshold aligned across this site">
##FORMAT=<ID=TU,Number=2,Type=Integer,Description="Number of 'T' alleles used in tiers 1,2">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Combined depth across samples">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=NT,Number=1,Type=String,Description="Genotype of the normal in all data tiers, as used to classify somatic variants. One of {ref,het,hom,conflict}.">
##INFO=<ID=PNOISE,Number=1,Type=Float,Description="Fraction of panel containing non-reference noise at this site">
##INFO=<ID=PNOISE2,Number=1,Type=Float,Description="Fraction of panel containing more than one non-reference noise obs at this site">
##INFO=<ID=QSS,Number=1,Type=Integer,Description="Quality score for any somatic snv, ie. for the ALT allele to be present at a significantly different frequency in the tumor and normal">
##INFO=<ID=QSS_NT,Number=1,Type=Integer,Description="Quality score reflecting the joint probability of a somatic variant and NT">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref read-position in the tumor">
##INFO=<ID=SGT,Number=1,Type=String,Description="Most likely somatic genotype excluding normal noise states">
##INFO=<ID=SNVSB,Number=1,Type=Float,Description="Somatic SNV site strand bias">
##INFO=<ID=SOMATIC,Number=0,Type=Flag,Description="Somatic mutation">
##INFO=<ID=SomaticEVS,Number=1,Type=Float,Description="Somatic Empirical Variant Score (EVS) expressing the phred-scaled probability of the call being a false positive observation.">
##INFO=<ID=TQSS,Number=1,Type=Integer,Description="Data tier used to compute QSS">
##INFO=<ID=TQSS_NT,Number=1,Type=Integer,Description="Data tier used to compute QSS_NT">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NORMAL TUMOR
NC_072790.1 221367 . G C . LowEVS DP=49;MQ=30.81;MQ0=15;NT=ref;QSS=1;QSS_NT=1;ReadPosRankSum=-0.16;SGT=CG->CG;SNVSB=0.00;SOMATIC;SomaticEVS=0.11;TQSS=1;TQSS_NT=1 DP:FDP:SDP:SUBDP:AU:CU:GU:TU 5:0:0:0:0,0:1,2:4,16:0,0 17:1:0:0:0,0:2,2:14,29:0,0
Reactions are currently unavailable