Skip to content

Duplicates of JPG derivatives not detected #13

@DominicBM

Description

@DominicBM

I am observing some problems with the script's duplicate detection when it comes to the JPG derivatives, either because conversion does not always produce exactly identical outputs, or because some previous bot runs were using slightly different settings, (or it's just not looking for duplicates of the JPGs after a conversion from TIFF).

This is fine when the names are the same, and overwriting an older version of a file with another does no harm (or even is good, because it enforces consistency).

But when the bot is also moving page names at the same time, it will move the TIFF, leave the old JPG in place because it did not detect it, and then upload a new JPG (now not linked form the TIFF), creating a bit of a mess.

Example:

https://commons.wikimedia.org/wiki/File:Grand_Canyon._Same_locality_as_433._Old_Nos._470,_473,_500_-_NARA_-_517801.tif
https://commons.wikimedia.org/wiki/File:Grand_Canyon._Same_locality_as_433._Old_Nos._470,_473,_500_-_NARA_-_517801.jpg
https://commons.wikimedia.org/wiki/File:Grand_Canyon._Same_locality_as_433._Old_Nos._470,_473,_500,_1871_-_1878_-_NARA_-_517801.jpg

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions