-
Notifications
You must be signed in to change notification settings - Fork 11
Feature/lxl 4524 #1686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Feature/lxl 4524 #1686
Conversation
andersju
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, LGTM but see comments! Additionally:
- I suggest moving cutoff time (line 71), batch size (73), and runtime (210) to constants at the top. Maybe also the checkpoint threshold (155).
- logger.info walPath on trigger, or at least archiveRoot on startup?
housekeeping/src/main/groovy/whelk/housekeeping/HistoryArchiver.java
Outdated
Show resolved
Hide resolved
Thank you! Excellent feedback! ⭐ I agree with and have fixed all of the issues, except the moving of values to the top. I find this to be useful only if the values are used more than once, otherwise i prefer them where they are used. |
olovy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Requested change: We shouldn't delete the tombstone in lddb. We should be able to tell that the resource has existed even though we delete the history (possibly including the last version in lddb).
|
Is the added complexity of generating diffs worth it now that we archive the history to disk instead of trying to compact the history table? |
Adds history archiving (of deleted records) to librisXL. A change to secret.properties (setting a base path for the archive) is necessary. Until it is set, this does nothing (but also isn't destructive in any way).
The history archive consists of gzipped json-lines files. Each around 100Mb in size and named/tagged with the time of archiving. These can be easily searched for individual records, typically using "zcat | grep". Each line consists of an 'original' version and a set of diffs for each following version.