-
Notifications
You must be signed in to change notification settings - Fork 7
Labels
epicA major task that should be broken down into smaller tasksA major task that should be broken down into smaller tasks
Description
As discussed in many issues the current format of mapping metadata entries to sequences in the multi-segmented case is suboptimal. Here we proceed as voted for in microbioinfo: https://microbial-bioinfo.slack.com/archives/CB0HYT53M/p1760961465729399
Users can add an additional column fastaId to the metadata tsv with a space separated list of all the fasta headers that should be linked to that entry. If no such entry is supplied we fall back to using the submissionId and assume this is the same as the fasta header Id.
Preprocessing will now assign the segment.
Steps:
- Migration of sequence compression format in backend: Store compression dictionaries in database to make database self-sufficient #4769 - original unaligned sequences do not have an assigned segment
- Refactor of how the backend joins sequences and metadata entries, will now send preprocessing originalData as a record from fastaHeader to sequence: feat!(backend): refactor multi-segment submission (2/n) #5398
- Have preprocessing assign the segment using nextclade sort (config refactor) and also return a mapping of the fastaHeaderId to the segment feat!(prepro, config): assign segment with nextclade sort #4783
- Update the edit page to use the fastaHeader mapping and work correctly
- Migrate older data to have fastaHeader mapping
- Before releasing: update CCHF example data
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
epicA major task that should be broken down into smaller tasksA major task that should be broken down into smaller tasks