-
Notifications
You must be signed in to change notification settings - Fork 7
Description
At the moment preprocessing generates each metadata field independently from the input metadata submitted by the user or obtained from the nextclade run. However, it is not configured to use generated metadata to create new metadata fields.
For example the concatenate takes country, date and accession as input and joins them to create a displayName field. The processed country field is processed by process_options to validate it is a valid country. We do not perform this processing in concatenate and concatenate thus accepts invalid countries. The only way to add this validation would be to create a new function with the necessary input of both the concatenate and process_options, this is what we had to do with the build_display_name and my new assign_custom_lineage function.
Should we alternatively restructure the pipeline to allow metadata fields to be created from processed metadata fields? i.e. allow concatenate to take the processed country field as input instead of the non-processed input field. This raises a question of how many layers of fields can/shoudl we allow. Given input metadata field I1 and I2, we now support output field A produced from I1 and I2. If we allow B produced from A and I2 we might still want to create C from B and I1...