Skip to content

geonetwork datadir checker useless ressources#35

Merged
jeanmi151 merged 39 commits intogeorchestra:masterfrom
jeanmi151:datadir_gn_checker
Oct 30, 2025
Merged

geonetwork datadir checker useless ressources#35
jeanmi151 merged 39 commits intogeorchestra:masterfrom
jeanmi151:datadir_gn_checker

Conversation

@jeanmi151
Copy link
Collaborator

following #11 ...

The aim of this is to add a checker to spot files that are no longer needed because geonetwork forget to delete them

it checks the database records (metadata table) and search in the /mnt/geonetwork_datadir/data/metadata_data/
(value here https://github.com/georchestra/datadir/blob/docker-master/geonetwork/geonetwork.properties#L2)

@jeanmi151
Copy link
Collaborator Author

jeanmi151 commented Oct 23, 2025

Look like that now
image

@jeanmi151 jeanmi151 requested a review from landryb October 23, 2025 16:16
Copy link
Member

@landryb landryb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested it again and it looks better, there a still a bunch of cleanups to do..

now i know why didn't like the size to be 'humanized' in the display (versus raw bytes).. now you cant easily sort to figure out which leftover dir takes the most space. filtering on 'MB' and sorting doesnt help either because 44 is between 1 and 9, as those are strings..

@landryb
Copy link
Member

landryb commented Oct 29, 2025

ah, and can you also use black on all files ? i try to keep the code black-formatted..

@landryb
Copy link
Member

landryb commented Oct 29, 2025

now that we use a bootstrap table to display the values, i think we can use a formatter to humanize the numbers on the fly, like the ones in

function urlFormatter(value, row) {
- and some js like https://gist.github.com/lanqy/5193417

this way, no need for jinja filesizeformat filter in the backend, raw numbers are stored in the task results (which make it easier to reuse if you build something that parses the json from the task result), and eventually that might allow to sort based on the raw value ?

another note.. i dont think we need to store the path/url twice, nor the problem type in the results:

path	"101932200-101932299/101932275"
problem	"5.0 MB"
type	"UnusedFileRes"
url	"101932200-101932299/101932275"

unless you plan to tackle the other cases i had listed in #11 (comment) - for now the type is always the same value, and i dont think we call GetPbStr with the current problem, since we display it straight in a table ?

@jeanmi151
Copy link
Collaborator Author

now that we use a bootstrap table to display the values, i think we can use a formatter to humanize the numbers on the fly, like the ones in

function urlFormatter(value, row) {

this way, no need for jinja filesizeformat filter in the backend, raw numbers are stored in the task results (which make it easier to reuse if you build something that parses the json from the task result), and eventually that might allow to sort based on the raw value ?

another note.. i dont think we need to store the path/url twice, nor the problem type in the results:

path	"101932200-101932299/101932275"
problem	"5.0 MB"
type	"UnusedFileRes"
url	"101932200-101932299/101932275"

unless you plan to tackle the other cases i had listed in #11 (comment) - for now the type is always the same value, and i dont think we call GetPbStr with the current problem, since we display it straight in a table ?

done in my last commit 0b9e744

@jeanmi151 jeanmi151 merged commit 79a5b97 into georchestra:master Oct 30, 2025
@jeanmi151 jeanmi151 deleted the datadir_gn_checker branch October 30, 2025 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants