Skip to content

Conversation

@jannistsiroyannis
Copy link
Contributor

Adds history archiving (of deleted records) to librisXL. A change to secret.properties (setting a base path for the archive) is necessary. Until it is set, this does nothing (but also isn't destructive in any way).

The history archive consists of gzipped json-lines files. Each around 100Mb in size and named/tagged with the time of archiving. These can be easily searched for individual records, typically using "zcat | grep". Each line consists of an 'original' version and a set of diffs for each following version.

Copy link
Member

@andersju andersju left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, LGTM but see comments! Additionally:

  • I suggest moving cutoff time (line 71), batch size (73), and runtime (210) to constants at the top. Maybe also the checkpoint threshold (155).
  • logger.info walPath on trigger, or at least archiveRoot on startup?

@jannistsiroyannis
Copy link
Contributor Author

Nice, LGTM but see comments! Additionally:

* I suggest moving cutoff time (line 71), batch size (73), and runtime (210) to constants at the top. Maybe also the checkpoint threshold (155).

* logger.info walPath on trigger, or at least archiveRoot on startup?

Thank you! Excellent feedback! ⭐ I agree with and have fixed all of the issues, except the moving of values to the top. I find this to be useful only if the values are used more than once, otherwise i prefer them where they are used.

Copy link
Contributor

@olovy olovy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requested change: We shouldn't delete the tombstone in lddb. We should be able to tell that the resource has existed even though we delete the history (possibly including the last version in lddb).

@olovy
Copy link
Contributor

olovy commented Jan 30, 2026

Is the added complexity of generating diffs worth it now that we archive the history to disk instead of trying to compact the history table?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants