Skip to content

Conversation

@HafeezOJ
Copy link
Collaborator

@HafeezOJ HafeezOJ commented Aug 7, 2025

Description

This PR enhances ReBACH to check the path to local storage for duplicate bags. It also handles where to check for duplicate bags for different workflow settings, which could be to upload or not to upload to a remote storage. Configuration fields that hold different storage locations are updated to the current specifications in line with OAIS model.

See #118

Documentation Update

  • I have updated README.md and other relevant documentation

Implementation Notes

  • Special functions are created in Utils.py to handle checking paths that are local to ReBACH for duplicate bags.
  • Counts of already preserved items in archival staging storage is reported in logs message
  • A flag (--check-staging-storage) can now be set to check Wasabi for duplicate bags. Wasabi will also be checked for duplicate bags if the workflow is configured to upload to a remote storage location.
  • Folder name of a package will be reused if it exists in the ingest staging storage from a previous run and no new version is generated.
  • Configuration fields for storage locations are updated to the current specifications in line with OAIS model
  • AP Trust configuration has been moved from .env.ini configuration file in ReBACH to .toml configuration file in bagger.
  • Terms for storage locations are updated in log messages
  • Updated ReBACH and Bagger READMEs

@HafeezOJ HafeezOJ linked an issue Aug 7, 2025 that may be closed by this pull request
4 tasks
@HafeezOJ HafeezOJ changed the title Feature check if a bag exists in a local storage Feature: Check if a bag exists in a local storage Aug 7, 2025
@HafeezOJ HafeezOJ self-assigned this Aug 7, 2025
@HafeezOJ HafeezOJ marked this pull request as ready for review August 7, 2025 06:27
@HafeezOJ HafeezOJ requested a review from zoidy August 7, 2025 06:29
Copy link
Collaborator

@zoidy zoidy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a bug in the bagger code that erroneously detects that a bag is already in archival staging storage or archival storage. Integration.py L123

@HafeezOJ
Copy link
Collaborator Author

HafeezOJ commented Aug 14, 2025

The bug is due to using the same storage location for both ingest_staging_storage and archival_staging_storage. I enforced using separate locations from both storage locations.

@zoidy
Copy link
Collaborator

zoidy commented Aug 19, 2025

There is a bug in the bagger code that erroneously detects that a bag is already in archival staging storage or archival storage. Integration.py L123

Resolved #121 (comment)

@zoidy zoidy self-requested a review August 19, 2025 22:37
zoidy
zoidy previously approved these changes Aug 22, 2025
@zoidy zoidy dismissed their stale review August 22, 2025 14:05

#121 (comment) remains unresolved

@zoidy zoidy self-requested a review August 22, 2025 15:54
@HafeezOJ HafeezOJ merged commit 81e9ab6 into main Aug 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Check if a bag exists in a local storage

3 participants