Skip to content

VCF header error #314

@drtconway

Description

@drtconway

Thanks again for developing a terrific library!

I am parsing a VCF what contains the following header lines (with the bulk of each removed for clarity:

...
##GATKCommandLine=<ID=CombineGVCFs,CommandLine="CombineGVCFs --output variants/24PS01319-RDN0259-00.merge.dragen.fe24071a.family.gvcf.gz ....",Version="4.2.6.1",Date="October 10, 2024 at 9:27:10 PM AEDT">
##GATKCommandLine=<ID=CombineGVCFs,CommandLine="CombineGVCFs --output variants/24PS01319-RDN0259-00.merge.dragen.g.vcf.gz ...",Version="4.2.6.1",Date="October 10, 2024 at 7:08:29 PM AEDT">
...

The VCF reader complains with the following error:

Error: Custom { kind: InvalidData, error: InvalidRecordValue(DuplicateId("CombineGVCFs")) }

I've just been reading through the VCF specification, and I can't see that it requires the ID values for the predefined structured metadata lines be unique, and it doesn't say anything about the semantics of non-predefined structured metadata lines in this respect.

I can see why it makes sense to complain given the semantics of the predefined metadata types (INFO, FILTER, etc), because the uniqueness of the ID is a natural consequence of how those metadata types are used, even though it is not explicitly stated.

However I can't see anything that forbids duplicate ID values for non-predefined metadata types.

Have I missed something? (It's quite possible!)

Tom.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions