Replacing book files causes duplicate DB entries #221

secondsabre · 2025-10-20T17:39:28Z

secondsabre
Oct 20, 2025

Not really a bug or a feature request, but putting this in here by request. Hopefully can spitball some solutions.

First, the caveats:
-No editing of book files (CBR/CBZ/etc) allowed. This includes renaming, embedding meta, etc. Moving of files to different folders/paths is fine, but the file itself remains untouched (same filename and hash).
-This whole thing is related to static Lists, not SmartLists: SmartLists solve the problem, but come with their own quirks and issues, and are outside scope
-This is a deliberatly obtuse edge case example to illustrate the behaviour in question
-Personally, I don't use add-ons like Database Manager or Library Organizer, so I don't know if/how they approach problems like this. Maybe there's already a solution out there for this.
-In my case I'm using an SQL database, so there's a little bit more fragmentation involved.

Situation:
Let's say we have a series/volume: we'll call it SERIES X. It's a 6-issue run, sorted into a watched folder that matches the series name. Example filenames:
Series X [V2020]/Series X - 01 (Digital) (Scanner).cbz
Series X [V2020]/Series X - 02 (Digital) (Scanner).cbz
Etc

In CRCE, these books have been added to the library via the scanning/watched folder process (so assigned GUID's) and metadata tagged via ComicVine. For the sake of example, let's also say the GUID's are sequential, so Issue 1 is ID1, Issue 2 is ID2, etc.

The books are also added to Lists (Static/Dumb List, not SmartList).
-Issues 1-12 are in a List corresponding to the Volume: Series X [V2020]
-Issues 2-4 are also part of a crossover/event, so are on another List for that event: Event Y [2019]
-All 6 issues are also on a List for a reading order for the universe as a whole: Publisher Z Chronological

So let's break that all down: <Series #> (<lists it's included on>) <GUID>
Series X 1 (Series X [V2020], Publisher Z Chronological) <ID1>
Series X 2 (Series X [V2020], Publisher Z Chronological, Event Y [2019]) <ID2>
Series X 3 (Series X [V2020], Publisher Z Chronological, Event Y [2019]) <ID3>
Series X 4 (Series X [V2020], Publisher Z Chronological, Event Y [2019]) <ID4>
Series X 5 (Series X [V2020], Publisher Z Chronological <ID5>
Series X 6 (Series X [V2020], Publisher Z Chronological <ID6>

As these are all static lists, they're held in ComicDB.xml as just a list of GUIDs: no other reference point.

Now, say we get a new file: a fixed version of Issue 2. It has filename Series X - 02 (Digital) (Fixed) (Scanner).cbz, and we want to replace the original.

So you (or some utility like Library Organizer):
-Go into the folder: Series X [V2020]
-Delete the original file: Series X - 02 (Digital) (Scanner).cbz
-Paste in the new Fixed file: Series X - 02 (Digital) (Fixed) (Scanner).cbz

On the next scan, the new file is added to the library and assigned a GUID (let's say ID50). This step could be done before or after the actual file move, doesn't matter.

Upon loading any of the lists that contained the previous file, it'll show as FILE NOT FOUND. If you want to update the book in each of those lists, you can:
(A) Identify which lists the original file was in: right-click>Show In List, then drag-and-drop the new book from the library into those lists and do whatever sorting/organizing you need to.
(B) Hacky option: use the Change File Link script to point ID2 to the new file

Option (A) could be time-consuming (depending on the number of Lists it's contained in) and still requires fiddling with order if you have a custom sort, but would fully replace the old file with the new one.
Option (B) just replaces the file across all Lists (sorcery), but the problem is that both ID2 and ID50 both still exist in the database: they're just pointing to the same file.
You could remove ID2 from the Library, but would also remove the book from Lists, defeating the purpose.
You could remove ID50, but it would re-add to the library on the next scan unless you have the 'Files manually removed from the Library will not be added again' option enabled in the scan settings, which you might not want for whatever reason.

Now, as to possible solutions.
The most direct one I can think of is literally just a find->replace of the GUID in ComicDB.xml. This could technically even be done externally in a text editor regardless of DB state, since lists are plaintext in the XML file.

Ideally, this would use some kind of GUI to identify the files to be replaced (Original) and be replaced with (Fixed) using the GUIDs, and would copy all the existing metadata from Original to Fixed, but that feels maybe out of scope. Either way, I could envision the UI working the same way as Change File Link:

Right-click-hook the book to be replaced (Original) and grab it's GUID
call a file browser dialog
choose the replacement file (Fixed)
3.1 Check to see if the Fixed file is already in the DB;
3.2 If yes, grab its GUID
3.3 If no, add to the DB and grab GUID
DB call to copy all metadata (excluding filepath, I guess?) from Original and paste to Fixed
Search ComicDB.xml for Original GUID
6.1 If found, replace found entries with Fixed GUID

I'm like 90% certain this could be done through an add-on, unless there's scripting limitations on what meta can be copied or what changes can be made to ComicDB.xml? I'm no expert though, and it's beyond my current abilities but I'm trying to poke around and learn what I can.

Either way, replacing files is not uncommon within the digital comic world, be it (F)ixed files or changing from an old C2C to a NoAds or Digital with better quality, so a way to seamlessly replace a book across the whole Library with a better copy would be a welcome addition (at least, for me). Thanks for reading all this, and I'm looking forward to seeing what people come up with (but pls no "Just use SmartLists" thnx). 😁

maforget · 2025-10-20T18:28:55Z

maforget
Oct 20, 2025
Maintainer

First you have some kind of idea that all of books entry are all ID that point to a database. It is not. These are ComicBook object that contain properties one of which is the id. It is a good way to track the entries and is used for lists. But in the end it is just a property of an object.

The ComicDb.xml is just the representation of the database saved in XML form. You don't go in the file and manually replace things, you change the object properties or remove the object completely. When it's saved the internal representation of the database is saved as an XML.

On the next scan, the new file is added to the library and assigned a GUID (let's say ID50). This step could be done before or after the actual file move, doesn't matter.

This is where you are making a crucial mistake. The idea of Change File Link is to replace the entry BEFORE it is in the database. Don't add it via scanning. Replace the file without adding it, you don't get 2 entries in that case. It replaces the ComicBook object FilePath properties every thing else is unchanged.

Your way of doing thing means that you have imported a duplicate. This might be what some users want that do keep duplicate. They are 2 distinct file so in this regard that's expected behavior. But you might have explained how using Change File Link do causes duplicate, it's because the files were added before using it. Thank for providing an explication.

What people usually do is have some kind of staging folder where you import files. They go into the Folder tab and scrape them, add to the library etc. Then they use Library Organizer to move them to the required folder.

Yes the solution would be an Add-on. Probably updating Change File Link to search the Library and check that the chosen book isn't already in the library. Library Organizer already does something like that.

https://github.com/Stonepaw/comicrack-library-organizer/blob/54d3882e2bd94b7ff3f26637cd1df3dea1c4b96b/lobookmover.py#L639-L646

def find_duplicate_book(self, path):
    """
    Trys to find a book in the CR library via a path
    """
    for book in ComicRack.App.GetLibraryBooks():
        if book.FilePath == path:
            return book
    return None

You would need to have a dialog asking to remove the new entry or not, would need to be careful, because what if that new file is also in another list. Removing it would end up in a file not found.

0 replies

secondsabre · 2025-10-20T19:41:46Z

secondsabre
Oct 20, 2025
Author

Ah, these are great points. I know from experience that moving a file that's already in the DB will usually update its path on the next scan, so yea, changing the link before adding to the DB should bypass the duplicate entry being created.

I do use a staging folder of sorts where freshly downloaded files go, and it doesn't get scanned.
From there I'll do all the physical moves to my watched Library folders and let CR scan it all in automatically. The advantage to this (or so I thought) is that it lets me filter the duplicates in CR so I can see which files I need to replace/Change File Link on, but you're right, this is what's inadvertently creating duplicate entries with no real way to "replace" in the DB.

I use the Folders tab a lot to create Lists, since I usually find the file browser interface easier to navigate instead of running searches on the Library as a whole.

Side note, possibly unrelated: how does scraping work for files not yet added to the Library but done through the Folders tab? I'd assume (I might be wrong) those files would still be written to the DB (as an Object, thank you for clarifying), but maybe have some flag that sets them apart? Just curious.
[EDIT: Never mind, answered my own question. The scraped info doesn't seem to be persistant if they're not the in Library, and it's lost once the program is closed and opened]

And of course, thanks for the response and the help and the clarification. I don't have enough knowledge to dig into the codebase and really understand what's happening under the hood, so most of this was based on what little I could glean from the process itself and digging into the XML (esp since I don't know how to dig into the SQL DB manually either 😅). So lots of assumptions, but I'm glad it's starting to piece together in my brain.

0 replies

maforget · 2025-10-20T20:37:10Z

maforget
Oct 20, 2025
Maintainer

Side note, possibly unrelated: how does scraping work for files not yet added to the Library but done through the Folders tab? I'd assume (I might be wrong) those files would still be written to the DB (as an Object, thank you for clarifying), but maybe have some flag that sets them apart? Just curious.

[EDIT: Never mind, answered my own question. The scraped info doesn't seem to be persistant if they're not the in Library, and it's lost once the program is closed and opened]

They are still ComicBook objects they just live in TemporaryBooks collection instead. When adding to the library they are just moved from one collection to the other. If the file aren't in the database or you didn't write the info to the files, then yes they are lost upon exit.

What you are describing is exactly what the Duplicate Manager plugin does. It already has mentions for c2c, noads, etc. It seems a little bit hard to configure. But that is what you want or at least to use as inspiration. I am not sure if it will fix your list issue. But it might be worth a look.

The problem is always how do you decide which files are duplicates because they might have different read progress, some might be newly imported with no metadata etc. Automating that is what is difficult. Issue #153 as the same issue.

The fix could be easy using Change File Link to check for duplicate using the above code and if the chosen file is already in the database remove the entry in the database (ComicRack.App.RemoveBook) before updating the filepath like it already does.

Maybe there is a use for ComicRack.App.GetBook(file). It uses the program BookFactory and return either the ComicBook object from the library or a temp if it isn't. You then have access to all that ComicBook properties.

See the wiki page for a list of API that plugins can use including IApplication (ComicRack.App).

esp since I don't know how to dig into the SQL DB manually either

If you use the backup database while connected to a SQL db it now outputs the complete ComicDB.xml that includes the books. It can be use to switch to a regular XML database or to restore that SQL db. You can use that just to see how it's formatted. Underneath it's still all just XML even in SQL.

0 replies

secondsabre · 2025-10-20T21:09:33Z

secondsabre
Oct 20, 2025
Author

Yo, this is all such great info. Really, thanks for taking the time to dig into this with me.

I did take a look at Duplicates Manager, but you're right: it doesn't seem to do anything with Lists, and the rulesets don't seem to be quite broad enough. There's only a few properties that it can catch, so while it can work for some situations, it would't for others.

I'm taking a closer look at Library Organizer as well, but based on the logic I see, it also wouldn't do anything with Lists (I could be wrong). It definitely has a trigger for finding duplicate books and give an option to replace in the DB, but I don't think it replaces all references to the Object across everything? Maybe I'm wrong, still testing that out.

Still, this has given me a starting point to work on digging myself out of the hole, and hopefully implementing a way to avoid it in the future? I'm still learning Python and programming in general, but this could be a project to learn with.

0 replies

maforget · 2025-10-20T22:23:32Z

maforget
Oct 20, 2025
Maintainer

I'm taking a closer look at Library Organizer as well, but based on the logic I see, it also wouldn't do anything with Lists (I could be wrong).

I am not sure also, but based on what I see it seems so. It removes the old file before moving the new and removes that old file from the db. But it's probably only if the file already exists, in your case if only using it to move to a folder without renaming would probably not work.

https://github.com/Stonepaw/comicrack-library-organizer/blob/54d3882e2bd94b7ff3f26637cd1df3dea1c4b96b/lobookmover.py#L1024-L1036

elif self.duplicate_action is DuplicateAction.Overwrite:
    try:
        if self.profile.CopyReadPercentage and type(oldbook) is not FileInfo:
            book.LastPageRead = oldbook.LastPageRead
        FileIO.FileSystem.DeleteFile(undo_path, FileIO.UIOption.OnlyErrorDialogs, FileIO.RecycleOption.SendToRecycleBin)

    except Exception, ex:
        self.logger.Add("Failed", self.report_book_name, "Failed to overwrite " + undo_path + ". The error was: " + str(ex))
        return MoveResult.Failed

    #Since we are only working with images there is no need to remove a book from the library
    if type(oldbook) is not FileInfo:
            ComicRack.App.RemoveBook(oldbook)

Also a tip if you are new to python. This isn't python as such it is IronPython a mix of .NET and python. Which is why you see things like FileIO.FileSystem.DeleteFile. You can usually use either python functions or .NET functions. But .NET is very specific about types and python isn't.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replacing book files causes duplicate DB entries #221

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Replacing book files causes duplicate DB entries #221

Uh oh!

secondsabre Oct 20, 2025

Replies: 5 comments

Uh oh!

Uh oh!

maforget Oct 20, 2025 Maintainer

Uh oh!

Uh oh!

secondsabre Oct 20, 2025 Author

Uh oh!

maforget Oct 20, 2025 Maintainer

Uh oh!

secondsabre Oct 20, 2025 Author

Uh oh!

maforget Oct 20, 2025 Maintainer

secondsabre
Oct 20, 2025

maforget
Oct 20, 2025
Maintainer

secondsabre
Oct 20, 2025
Author

maforget
Oct 20, 2025
Maintainer

secondsabre
Oct 20, 2025
Author

maforget
Oct 20, 2025
Maintainer