Ingest should edit sequences in PROCESSED (erroring) state when upstream data changes

> **Note:** This issue was drafted by an LLM (Claude) based on a Slack conversation and a read of the relevant codebase. It has only been lightly reviewed by a human — please check the details before acting on it.

## Background

When ingest runs and a sequence already exists in Loculus with a non-`APPROVED_FOR_RELEASE` status, the current code in `ingest/scripts/compare_hashes.py` skips updating it entirely:

```python
# compare_hashes.py ~line 112
if status != "APPROVED_FOR_RELEASE":
    update_manager.blocked[status][metadata_id] = corresponding_loculus_accession
    return update_manager
```

This means that if upstream data is corrected (e.g. author name formatting changed), sequences that are stuck with preprocessing errors will **never** receive the fix — they remain in their broken state indefinitely, even across preprocessing version bumps.

This was discovered during a PPX rollout: sequences had author names in the old comma-separated format (`A. Marcello, B.M. Marycelin, ...`) rather than the semicolon-separated format required by current validation. These sequences had been erroring since a formatting change over a year ago and were never updated because ingest skipped them.

## The statuses

Sequences can be in four states (`SubmissionTypes.kt`):
- `RECEIVED` — submitted, not yet sent to preprocessing
- `IN_PROCESSING` — currently being preprocessed
- `PROCESSED` — preprocessing complete; this includes sequences with errors awaiting user correction
- `APPROVED_FOR_RELEASE` — released

Sequences stuck with errors live in `PROCESSED` status.

## Proposed fix

The backend already has a `/submit-edited-data` endpoint (`SubmissionController.kt`) specifically for editing sequences in `PROCESSED` status — this is what users use to correct their own errors. Ingest should use this same endpoint for sequences where:

1. The sequence is in `PROCESSED` status (i.e. has errors / awaiting release)
2. The upstream hash has changed
3. The sequence has **not** been curated

In `compare_hashes.py`, this means adding an `edit` path alongside the existing `submit`/`revise`/`noop`/`blocked` paths:

```python
if status == "PROCESSED" and not previously_submitted_entry.curated:
    update_manager.edit[metadata_id] = corresponding_loculus_accession
    return update_manager
```

Sequences in `RECEIVED` or `IN_PROCESSING` don't need special handling — they will be reprocessed with the latest data naturally.

## Curated sequence safety

The codebase already detects curation in `compare_hashes.py`:

```python
# A sequence is considered curated if it has ever been submitted by anyone
# other than insdc_ingest_user
latest["curated"] = {v["submitter"] for v in sorted_versions} != {"insdc_ingest_user"}
```

Curated sequences in `PROCESSED` state should remain in the existing `blocked["CURATION_ISSUE"]` path and trigger a notification, as they do today for `APPROVED_FOR_RELEASE` curated sequences. This is important to avoid the problem described in #3084.

## Workaround used

Sequences with errors due to the old author format were deleted from staging and production so that ingest would re-ingest them fresh. This wastes accessions and requires manual intervention.

## Related

- #3084 — Ingest should not treat curator revisions as latest and revise (the curated-sequence caveat in our fix directly addresses this)
- #3085 — How to maintain curation changes across ingest revisions (broader context on curated sequence handling)
- Original author formatting change: #2986
- Ingest skip logic: `ingest/scripts/compare_hashes.py`, `process_hashes()` function
- Edit endpoint: `backend/.../SubmissionController.kt` → `submitEditedData()`
- Edit status precondition: requires `Status.PROCESSED` (`SubmissionDatabaseService.kt`)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingest should edit sequences in PROCESSED (erroring) state when upstream data changes #6072

Background

The statuses

Proposed fix

Curated sequence safety

Workaround used

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ingest should edit sequences in PROCESSED (erroring) state when upstream data changes #6072

Description

Background

The statuses

Proposed fix

Curated sequence safety

Workaround used

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions