Skip to content

Feature: Add APIS integration to SFA preprocessing workflow #173

@fiver-watson

Description

@fiver-watson

NOTE: this issue ticket description was updated 2026-02-26 to reflect updates from SFA


Background and context

SFA's long-term solution for digital preservation involves many disparate systems across multiple different departments and agencies coordinating. However, the Archival Information System (AIS) is intended to be source of truth for all archival metadata. Given that various users will be creating and submitting metadata that at times may be in conflict with existing records or other submissions, SFA has developed an intermediary application, called APIS, to negotiate the review and synchronization of metadata prior to import into the AIS.

To support this new integration, ASI will need to modify the existing SFA preprocessing child workflow so that it can interact with APIS via API and confirm there are no outstanding conflicts before proceeding with preservation processing.

Proposed implementation

After meetings with SFA and third-party developers, we propose the following integration plan for APIS:

Image

First, preprocessing continues normally until validation has been successful. Any content failures occurring during validation will mean that APIS is never contacted, and the workflow will end as it currently does.

If validation succeeds, the Enduro will:

  • Make a copy of the SIP's eCH-0160 metadata.xml file
  • Deliver that copy to APIS via API, using the importTasks POST endpoint
  • This endpoint has 3 required parameters:
    • file: the eCH-0160 metadata.xml file from the SIP
    • sipType: The type of SIP identified earlier in the SFA preprocessing workflow the API's enum has been aligned to use the same sip types as we do in Enduro:
      • DigitizedAIP
      • DigitizedSIP
      • BornDigitalAIP
      • BornDigitalSIP
    • username: the user logged in who is currently submitting the request. String; can be an email for example.
  • APIS should return a request ID as part of the response. This ID needs to be kept in memory, both for status updates on the current process and to initiate the importRun API call later in the related post-storage workflow.
  • As it waits for a final outcome from the request, Enduro can query the status of the previous importTasks request using the returned ID
    • A GET /api/ImportTasks/{id}/status request using the returned ID can return the following elements:
       "status": "Neu",
       "analysisResult": "AlleGleich",
       "analysisProgressInPercent": 0,
       "importResult": "Erfolgreich",
       "importProgressInPercent": 0,
       "processedDocumentCount": 0,
       "totalDocumentCount": 0

Note that there are additional properties that may be returned in some cases, some of which may be useful to us - such as "analysisErrorMessage"in the case of an error. See the API specification for further details.

  • There are 4 main outcomes for a status query:
    • IN PROGRESS:
      • "status": "InAnalyse",
      • If the status is still in progress, Enduro can use the status, analysisProgressInPercent, and document count response parameters to provide status updates in the Enduro user interface periodically
    • ANALYSIS ERROR
      • "status": "Analysiert",
      • "analysisResult": "Fehler",
      • If the status returns an error or cancellation, Enduro should update the SIP status to ERROR, fail the workflow, and clean up
    • SUCCESS - CONTINUE - New or Same
      • "status": "Analysiert",
      • "analysisResult": "AlleNeu", = All new metadata - i.e. this is new metadata not in AIS; continue as normal
      • "analysisResult": "AlleGleich", = All SIP metadata is the same as what is in the AIS; continue as normal
      • With either of these analysis outcomes (new or same) Enduro can then continue with preprocessing to create the PIP and deliver it to the preservation engine.
      • During post-storage the ImportRun API call will use the default "ImportBehavior": "AppendOnly" option
    • METADATA CONFLICT
      • "status": "Analysiert",
      • "analysisResult": "Konflikte", = SIP and AIS metadata are NOT the same - there are differences in some of the records
      • In rare cases, APIS may return notice of a metadata conflict. This occurs when the metadata included in the delivered SIP XML file conflicts with what is currently in the system of record in some way. At this point, human intervention is required, as an archivist must review both sources and determine whether to cancel the ingest and resolve the issue upstream before re-ingesting, or treat the SIP as the source of truth and have it overwrite any conflicting metadata currently held in the AIS

When a conflict is returned in the status query response, Enduro should:

  • Pause the ingest workflow
  • Update the SIP status to PENDING
  • Display a decision dialog in the user interface and wait for input

SFA will determine the exact wording of the decision dialog, but the three options presented should essentially be:

  • CANCEL INGEST:
    • Update the SIP and the ingest workflow status to CANCELED, end the worklfow, and perform any required cleanup
    • As part of cleanup send a PATCH request to /api/ImportTasks/{id} with a "status": "Abgebrochen" to cancel the import task
  • CONTINUE & OVERWRITE:
    • Continue with the preprocessing workflow.
    • During post-storage, submit the ImportRun request using "importBehavior": "OverwriteAndAppend",
  • CONTINUE & APPEND:
    • Continue with the preprocessing workflow
    • During post-storage, submit the ImportRun request using "importBehavior": "AppendOnly",

Additional context and resources

A copy of the most recent APIS API specification can be found here in Drive. It can be loaded into a site like SwaggerEditor for a human-readable version if desired.

Meeting minutes about the APIS integration with SFA can be found here in Drive.

Metadata

Metadata

Assignees

Projects

Status

🛠 Refining

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions