-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary:
The current storage format for prediction and target sequences in preds_and_fdr_metrics is a stringified Python list (e.g., '[C, P, Q, ...]').
This approach introduces friction for downstream analysis because loading the CSV output requires mandatory post-processing (e.g., using ast.literal_eval) to parse the string back into a usable list of residue tokens.
Proposed Solution
Change the default storage format to a single, concatenated string (e.g., 'CPQ...'). The data can then be read and used immediately as a standard sequence string, eliminating the need for any parsing overhead when loading the CSV.
Optional Extension
Consider adding a column that formats predictions to be compatible with InstaNovo's _split_peptide function to re-obtain tokenised residues.
Description & Purpose:
No response
Additional Notes:
No response