refactor(backend): move revocation versionComment from dedicated column to metadata#6198
refactor(backend): move revocation versionComment from dedicated column to metadata#6198theosanderson-agent wants to merge 3 commits intomainfrom
Conversation
…mn to metadata Store revocation versionComment in originalData metadata instead of the dedicated version_comment column, unifying how versionComment is handled for both revisions and revocations. - When creating a revocation, store versionComment in originalData.metadata and compress using the compression service - Update the database view to automatically construct joint_metadata for revocations from originalData, making it pipeline-version-independent - Remove versionCommentColumn from SequenceEntriesTable and SequenceEntriesView - Remove versionComment field from RawProcessedData (now comes through processedData metadata via the view) - Remove conditional versionComment addition in ReleasedDataModel - Add migration V1.26 to move existing version_comment values into originalData metadata and drop the column Closes #3135 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 30a12fcd2a
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| conditionalMetadata( | ||
| earliestReleaseDate != null, | ||
| { |
There was a problem hiding this comment.
Keep revocation versionComment out of schema filtering
Removing the dedicated revocation versionComment merge here makes the value depend entirely on processedData.metadata, but that path is normalized by ProcessedMetadataPostprocessor.filterOutExtraFieldsAndAddNulls, which drops metadata keys not present in the organism schema. In environments/configs where versionComment is not part of schema metadata (for example the backend test configs), revocation comments will silently disappear from released data after this change. Previously this code guaranteed the field by injecting it after postprocessing.
Useful? React with 👍 / 👎.
Summary
This PR unifies how
versionCommentis stored for revisions and revocations, addressing #3135.Problem: Previously,
versionCommentwas stored differently depending on the type of version entry:originalData, processed through the preprocessing pipeline, and ended up inprocessedData.metadataversion_commentcolumn on thesequence_entriestable, then manually merged into released data via special-case code inReleasedDataModelThis inconsistency made the codebase harder to understand and maintain.
Solution: Following the approach suggested in #3135, this PR:
Stores revocation
versionCommentinoriginalData.metadata— When creating a revocation with a version comment, the comment is stored in theoriginalDataJSONB column as a metadata entry, compressed using the standard compression serviceAutomatically constructs
processedDatafor revocations in the database view — Thesequence_entries_viewis updated to buildjoint_metadatafor revocations directly fromoriginalData.metadata. This approach is pipeline-version-independent, meaning the versionComment survives processing pipeline version changes without needing to reprocess revocationsRemoves the dedicated
version_commentcolumn — A migration (V1.26) copies existingversion_commentvalues intooriginalData.metadatafor all affected revocations, then drops the columnRemoves special-case code — The
versionCommentfield is removed fromRawProcessedData, and the conditional metadata merging inReleasedDataModelis removed. The versionComment now flows through the same path for both revisions and revocationsFiles changed
V1.26: Migrates existing data and recreates the viewSubmissionDatabaseService.kt:revoke()now stores versionComment inoriginalData;streamReleasedSubmissions()no longer readsversionCommentColumn;RawProcessedDatano longer hasversionCommentfieldSequenceEntriesTable.kt/SequenceEntriesView.kt: RemoveversionCommentColumnReleasedDataModel.kt: Remove conditional versionComment addition for revocationsEarliestReleaseDateFinderTest.kt: RemoveversionCommentfield from test dataKey design decision
Rather than inserting into
sequence_entries_preprocessed_datafor revocations (which would be tied to a specific pipeline version and lost when the pipeline version changes), the view constructsjoint_metadatadirectly fromoriginalDatafor revocations. This means the versionComment is always available regardless of pipeline version changes, without needing to reprocess revocations.Test plan
(currently on main we also see None if this is empty - I think this is probably an issue with the website I assume it submits revocations with an empty string and not None, this isnt too important atm)
🤖 Generated with Claude Code
🚀 Preview: https://refactor-version-comment.loculus.org