Replies: 2 comments
-
|
Someone has previously attempted a similar project using the Lidarr API with what appears to have been a Meilisearch-based solution. You should try searching through the Discord Servarr channel for that old discussion, which happened roughly six months ago. The fundamental challenge you'll face is deduplication. When merging multiple data sources, deduplication becomes absolutely critical and will consume most of your effort. If you need guidance on this aspect, consider reaching out to the Chaptarr Discord community. While their codebase is entirely Claude-generated and I'm not familiar with their implementation details or part of their community, they continue active development on similar deduplication problems for book metadata. For database systems that excel at deduplication, the Mangabaka Discord community would be worth contacting since they handle this for comic book data. Before diving deeper, clarify your objectives. What specific goal are you trying to accomplish? How many users would potentially use this system? What specific frustrations with Lidarr are driving you toward this solution? The Lidarr metadata issue exists primarily because it lacks sufficient track-level data, forcing it to operate mainly at the album and artist level rather than incorporating track-based functionality. What specific limitations are you experiencing with a standard MusicBrainz mirror? Maintaining this system solo will be extremely challenging. Music industry datasets are massive and lack standardized patterns, making them inherently messy. While creating a unified relational database sounds ideal in theory, MusicBrainz already provides extensive cross-referencing between services. However, countless releases remain unlinked across platforms, which circles back to the matching and deduplication problem of identifying identical releases across different services. The real issue isn't MusicBrainz lacking data. As free databases go, MusicBrainz is well-established, open, and thoroughly populated. The actual problem is that Lidarr lacks active development and desperately needs modernization including new features, architectural improvements, faster file processing, and enhanced matching algorithms. When Picard successfully maps all tracks but Lidarr fails to organize them properly, the problem lies with Lidarr's implementation. While an improved metadata server could help, building one from a single source remains a massive undertaking. The more practical approach would be enhancing how Lidarr processes the metadata it already receives. If both Beets and Picard achieve good results while Lidarr struggles, the metadata source is in my opinion not the problem. |
Beta Was this translation helpful? Give feedback.
-
|
Apologies in advance for any weird word choices - auto-correct seems to be operating at drunk ninja-level lately and I've been really bad (lazy) at proofreading. :) One goal is to obviate the need for Lidarr to maintain my (own personal) "completeness" metric for artists releases. I follow between 1300 and 1400 right now. IMO without diving at all into Lidarr's code base, but just looking at the way it works and the people I see managing their GitHub, to fix it needs all those people gone from the project and the existing project completely buried, never to be see again. "It sucks" is about my summation. But it is what it is and there's nothing else out there right now that does what it does with a webUI The solution only needs to be "good enough" as I'm not trying to compete for public use with anyone else. I'm not too worried about deduplication as I'm not going to try and tackle this from all sides at once, nor trying to integrate every source completely. Musicbrainz for all it has, still misses a lot. I've added plenty to it and frankly that's not something I enjoy doing. Their site is a gring and everything they've done creates only more friction to populate, not less. Even little things like not clicking an 8x8 pixel square next to a field can cause it to stop ingesting your data and then when they tell you what tab to go to address the issue they do nothing to highlight what's missing or un-clicked. Fek me. I've stopped adding things there, so this DB will if nothing else, help me alone. Discogs contains a wealth of releases and artists not included in Musicbrainz all. Between the two, I think they still miss some physical media I have, but Discogs covers most of it alone, while Musicbrainz doesn't. So in this respect, this again doesn't relate to Lidarr, as the same issue with missing artists and releases can be seen using only Picard. My impetus was go get fast queries on Discogs data. But why also have to query a MB mirror? Might as well pull the data I want from there into the same DB. And then there's that Apple Music thing I mentioned to you before... Might as well renew my developer account and pull from them too, especially since I won't have to release an app with a supported and authenticated API consumer for anyone else. So we'll see what I can clobber together. I don't have that much time to dedicate to this as I really have a lot of other things pressing for my attention. If I can get something together with about 1-2 weeks of work spread over a couple of months (if need be), that'll make me content. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm really curious if anyone else has bothered to do something that goes this wide/deep...
In recent discussion for a Discogs-related issue causing releases to appear within the wrong category (Album,, EP, Single) I started playing with the API, and while doing that had the thought of populating my own streamlined database.
There were two main contributing factors for this. 1. The API (for discogs) requires a lot of record fetching to get the various bits of information that are needed to categorize a release along with its basic/core properties. 2. The discogs data has a lot (a LOT) of different ways to tag certain properties - many of which I simply don't care about. And 3... The BIG one. RATE LIMITS. With the large number of requests, the rate limits were killing me, even during simple testing with a single artist.
Long story longer, I've seen some scripts to populate various DBs with Discogs XML - but I won't be using any of those. They're all missing something, so something I don't need, are out of date, any combination of factors preventing me from just outright using them and calling it a day.
I also want more. Much more. Such as integrating all of Musicbrainz and all the back-links and references that exist between their IDs and Discogs. And the more I thought about it, it also makes sense to gather any other data that might be useful to cross-link, such as linking the IDs for some streaming services. Some of this data is available from the two sources mentioned, some of it not.
So my thought at the moment is to create a relational database that brings together the following sources of release info:
Discogs
Musicbrainz
Tidal
Deezer
Apple Music
Spotify
Trying to maintain the end result as a source of truth for any (or all) of those sources. In other words, the DB can be queried in place of a an API call to any of those services to get the desired information.
I'm still considering this from a high level at this point, and have only played with the Discogs and Musicbrainz (local mirror) APIs so far. If I go ahead with this, and manage to get something decent working, I'll release everything openly, the db mirror and the tools to create, populate and maintain it.
Beta Was this translation helpful? Give feedback.
All reactions