Skip to content

fix(metadata): fix metadata fetching relying for filename "first" instead of better fields like ISBN in Goodreads/Bookdrop#85

Open
balazs-szucs wants to merge 8 commits intogrimmory-tools:developfrom
balazs-szucs:bookdrop-metadata-fix
Open

fix(metadata): fix metadata fetching relying for filename "first" instead of better fields like ISBN in Goodreads/Bookdrop#85
balazs-szucs wants to merge 8 commits intogrimmory-tools:developfrom
balazs-szucs:bookdrop-metadata-fix

Conversation

@balazs-szucs
Copy link
Member

@balazs-szucs balazs-szucs commented Mar 20, 2026

📝 Description

This pull request introduces improvements to metadata fetching and lookup logic for book files, especially around the handling of insufficient metadata and ISBN-based lookups for Goodreads and bookdrop. Normally, you would wanna query via the "most specific" information possible. This, however, with Goodreads parser wasn't the case. It'd premateraly fallback to filenames which were in no-way accurate way to find bookmetata

Required for develop and main. Your PR title must use Conventional Commit format because maintainers squash-merge with the PR title and stable releases are computed from commit history. Example: fix(reader): prevent blank pages on chapter jump

Linked Issue: Fixes #

Required. Every PR must reference an approved issue. If no issue exists, open one and wait for maintainer approval before submitting a PR. Unsolicited PRs without a linked issue will be closed.

🏷️ Type of Change

  • Bug fix
  • New feature
  • Enhancement to existing feature
  • Refactor (no behavior change)
  • Breaking change (existing functionality affected)
  • Documentation update

🔧 Changes

🧪 Testing (MANDATORY)

PRs without this section filled out will be closed. "Tests pass" or "Tested locally" is not sufficient. You must provide specifics.

Manual testing steps you performed:

Regression testing:

Edge cases covered:

Test output:

Backend test output (./gradlew test)
PASTE OUTPUT HERE
Frontend test output (ng test)
PASTE OUTPUT HERE

📸 Screen Recording / Screenshots (MANDATORY)

Every PR must include a screen recording or screenshots showing the change working end-to-end in a running local instance (both backend and frontend). This means you must have actually built, run, and tested the code yourself. PRs without visual proof will be closed without review.


✅ Pre-Submission Checklist

All boxes must be checked before requesting review. Incomplete PRs will be closed without review. No exceptions.

  • This PR is linked to an approved issue
  • Code follows project backend and frontend conventions
  • Branch is up to date with develop (merge conflicts resolved)
  • I ran the full stack locally (backend + frontend + database) and verified the change works
  • Automated tests added or updated to cover changes (backend and frontend)
  • All tests pass locally and output is pasted above
  • Screen recording or screenshots are attached above proving the change works
  • PR is a single focused change (one bug fix OR one feature, not multiple unrelated changes)
  • PR is reasonably scoped (PRs over 1000+ changed lines will be closed, split into smaller PRs)
  • No unsolicited refactors, cleanups, or "improvements" are bundled in
  • Flyway migration versioning is correct (if schema was modified)
  • Required documentation updates are included in this repo or the current Grimmory docs surface (if user-facing changes)

🤖 AI-Assisted Contributions

If any part of this PR was generated or assisted by AI tools (Copilot, Claude, ChatGPT, etc.), all items below are mandatory. You are fully responsible for every line you submit. "The AI wrote it" is not an excuse, and AI-generated PRs that clearly haven't been reviewed are the #1 reason PRs get closed.

  • I have read and understand every line of this PR and can explain any part of it during review
  • I personally ran the code and verified it works (not just trusted the AI's output)
  • PR is scoped to a single logical change, not a dump of everything the AI suggested
  • Tests validate actual behavior, not just coverage (AI-generated tests often assert nothing meaningful)
  • No dead code, placeholder comments, TODOs, or unused scaffolding left behind by AI
  • I did not submit refactors, style changes, or "improvements" the AI suggested beyond the scope of the issue

💬 Additional Context (optional)

Summary by CodeRabbit

  • New Features

    • Added ISBN-based metadata lookup for enhanced book identification.
  • Improvements

    • Enhanced metadata handling to flag items with insufficient identifiers as pending review, reducing unnecessary online lookups.

dependabot bot and others added 8 commits March 19, 2026 19:46
Dependabot couldn't find the original pull request head commit, ea510f4.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…immory-tools#2)

Dependabot couldn't find the original pull request head commit, faed6bf.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…mory-tools#3)

Dependabot couldn't find the original pull request head commit, f110823.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ools#6)

Dependabot couldn't find the original pull request head commit, 9a8d7a1.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…tead of better fields like ISBN in Goodreads/Bookdrop
- Image builds:
  - pin frontend and backend Docker build stages to the build platform so multi-arch packaging can reuse architecture-independent work
  - add an Angular build cache mount and move dynamic version metadata to the end of the runtime stage so static layers stay reusable across tags
  - reduce image workflow checkout depth and keep preview builds on `linux/amd64` only to avoid unnecessary QEMU and history overhead

- Cache policy:
  - make CI packaging smoke builds consume the shared image cache without writing new BuildKit cache state
  - make normal preview builds consume shared GHA and registry caches without mutating the canonical cache
  - keep nightly and stable release builds as the workflows that refresh the shared image cache in both GHA and registry backends

- Preview override:
  - add a `refresh_shared_cache` input to the preview workflow for an explicit no-cache rebuild that repopulates the shared cache when maintainers need to bust and refresh it
  - keep the default preview behavior optimized for fast disposable builds rather than cache churn

- Validation:
  - keep workflow YAML parsing clean after the cache-policy changes
  - keep `git diff --check` clean for the touched Docker and workflow files
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Version 2.2.2 release documentation added to CHANGELOG. BookdropMetadataService refactored to conditionally skip online metadata refresh for non-searchable entries. GoodReadsParser enhanced with ISBN-based lookup path and centralized ID extraction helper.

Changes

Cohort / File(s) Summary
Release Documentation
CHANGELOG.md
Version 2.2.2 entry added documenting CI permission forwarding fix, release pipeline rebranding, and README link update.
Metadata Service Logic
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
Added conditional metadata refresh skip logic via hasSearchableMetadata() and hasAnyKnownIdentifier() helpers. Non-searchable metadata now marked PENDING_REVIEW without fetching remote data. Removed primaryFile field from Book construction.
Parser Enhancement
booklore-api/src/main/java/org/booklore/service/metadata/parser/GoodReadsParser.java
Added ISBN-based lookup path in fetchTopMetadata() with new extractGoodreadsIdFromOgUrl() helper. Refactored existing ISBN handling to use centralized ID extraction logic.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop, hop, through metadata we go,
ISBN paths, now shine and glow!
Searchable treasures, pending review,
Code hops forward with logic anew! ✨📚

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description provides a brief explanation of the change but is severely incomplete: no linked issue, type of change unchecked, changes/testing sections unfilled, no test output, and no required screen recording/screenshots. Complete required sections: link an approved issue, check the type of change, fill testing steps with specifics, provide actual test output, and attach screen recording or screenshots proving the change works end-to-end.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly describes the main fix: improving metadata fetching to prioritize ISBN over filenames in Goodreads/Bookdrop lookups, and follows Conventional Commits format.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
CHANGELOG.md (1)

1-12: Move # Changelog back to the top of the file.

The top-level heading should be the first line for consistent changelog structure/readability (and better compatibility with tooling that expects it there).

Proposed reorder
-## [2.2.2](https://github.com/grimmory-tools/grimmory/compare/v2.2.1...v2.2.2) (2026-03-19)
-
-### Bug Fixes
-
-* **ci:** forward checks permission to reusable test jobs ([e3b3e7c](https://github.com/grimmory-tools/grimmory/commit/e3b3e7c4d7aae23f517b6029af56df360d40edf1))
-
-### Chores
-
-* **ci:** rebrand and streamline the release pipeline ([`#66`](https://github.com/grimmory-tools/grimmory/issues/66)) ([1bc45da](https://github.com/grimmory-tools/grimmory/commit/1bc45da013415dc66621dc72eb21be9640028f4d))
-* Update README to use new links, and clean up formatting ([6b0c4aa](https://github.com/grimmory-tools/grimmory/commit/6b0c4aa66c2247de0335340dbae8ec0afc1fbd59))
-
 # Changelog
 
 All notable Grimmory releases are tracked here.
 
 This file is maintained by semantic-release.
+
+## [2.2.2](https://github.com/grimmory-tools/grimmory/compare/v2.2.1...v2.2.2) (2026-03-19)
+
+### Bug Fixes
+
+* **ci:** forward checks permission to reusable test jobs ([e3b3e7c](https://github.com/grimmory-tools/grimmory/commit/e3b3e7c4d7aae23f517b6029af56df360d40edf1))
+
+### Chores
+
+* **ci:** rebrand and streamline the release pipeline ([`#66`](https://github.com/grimmory-tools/grimmory/issues/66)) ([1bc45da](https://github.com/grimmory-tools/grimmory/commit/1bc45da013415dc66621dc72eb21be9640028f4d))
+* Update README to use new links, and clean up formatting ([6b0c4aa](https://github.com/grimmory-tools/grimmory/commit/6b0c4aa66c2247de0335340dbae8ec0afc1fbd59))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@CHANGELOG.md` around lines 1 - 12, Move the top-level heading "# Changelog"
to the very first line of the file so the file begins with that heading,
followed by a single blank line and then the existing release entries (starting
with "## [2.2.2]..."). Ensure the "# Changelog" line is removed from its current
location and placed above the "## [2.2.2]" section so tooling and readability
are consistent.
booklore-api/src/main/java/org/booklore/service/metadata/parser/GoodReadsParser.java (1)

137-148: Good refactor to use the centralized helper.

The refactoring to use extractGoodreadsIdFromOgUrl is consistent with the new approach in fetchTopMetadata.

However, note that unlike fetchTopMetadata (lines 74-86), there's no try-catch around the fetchDoc() call on line 140. If fetchDoc() throws a RuntimeException (which it does on IOException), the entire method fails instead of falling back to the preview search. Consider wrapping this block in a try-catch for consistency.

♻️ Optional: Add error handling for consistency with fetchTopMetadata
         String isbn = ParserUtils.cleanIsbn(fetchMetadataRequest.getIsbn());
         if (isbn != null && !isbn.isBlank()) {
             log.info("Goodreads Query URL (ISBN): {}{}", BASE_ISBN_URL, isbn);
-            Document doc = fetchDoc(BASE_ISBN_URL + isbn);
-            String goodreadsId = extractGoodreadsIdFromOgUrl(doc);
-            if (goodreadsId != null) {
-                BookMetadata metadata = parseBookDetails(doc, goodreadsId);
-                if (metadata != null) {
-                    return List.of(metadata);
+            try {
+                Document doc = fetchDoc(BASE_ISBN_URL + isbn);
+                String goodreadsId = extractGoodreadsIdFromOgUrl(doc);
+                if (goodreadsId != null) {
+                    BookMetadata metadata = parseBookDetails(doc, goodreadsId);
+                    if (metadata != null) {
+                        return List.of(metadata);
+                    }
                 }
+            } catch (Exception e) {
+                log.warn("GoodReads: ISBN lookup failed: {}, falling back to preview search", e.getMessage());
             }
         }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/metadata/parser/GoodReadsParser.java`
around lines 137 - 148, Wrap the ISBN-path in GoodReadsParser so the call to
fetchDoc(BASE_ISBN_URL + isbn) and the subsequent
extractGoodreadsIdFromOgUrl/parseBookDetails logic are enclosed in a try-catch
that catches RuntimeException (the same way fetchTopMetadata does); on exception
log the error (including the ISBN and exception) and allow the method to
continue so it falls back to the preview/search flow instead of propagating the
RuntimeException. Ensure you reference the existing helper methods
extractGoodreadsIdFromOgUrl and parseBookDetails when adding the try-catch so
behavior remains identical on success.
booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java (1)

118-127: Maintenance note: Keep synchronized with BookMetadata.

The comment on line 119 is helpful. The explicit list is clear and readable, though it does require manual updates when new provider IDs are added to BookMetadata.

For future consideration, an annotation-based approach or a dedicated method on BookMetadata (e.g., hasAnyIdentifier()) could reduce the maintenance burden, but this is fine for now.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`
around lines 118 - 127, Replace the manual list check in hasAnyKnownIdentifier
with a delegated method on the BookMetadata model: implement a public boolean
hasAnyIdentifier() in BookMetadata that checks all provider ID fields (same list
currently used: getIsbn13, getIsbn10, getAsin, getGoodreadsId, getGoogleId,
getHardcoverId, getHardcoverBookId, getComicvineId, getDoubanId,
getLubimyczytacId, getRanobedbId, getAudibleId) and then update
BookdropMetadataService.hasAnyKnownIdentifier(BookMetadata m) to simply return
m.hasAnyIdentifier(); this centralizes the identifier-maintenance surface and
keeps the comment in sync.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java`:
- Around line 118-127: Replace the manual list check in hasAnyKnownIdentifier
with a delegated method on the BookMetadata model: implement a public boolean
hasAnyIdentifier() in BookMetadata that checks all provider ID fields (same list
currently used: getIsbn13, getIsbn10, getAsin, getGoodreadsId, getGoogleId,
getHardcoverId, getHardcoverBookId, getComicvineId, getDoubanId,
getLubimyczytacId, getRanobedbId, getAudibleId) and then update
BookdropMetadataService.hasAnyKnownIdentifier(BookMetadata m) to simply return
m.hasAnyIdentifier(); this centralizes the identifier-maintenance surface and
keeps the comment in sync.

In
`@booklore-api/src/main/java/org/booklore/service/metadata/parser/GoodReadsParser.java`:
- Around line 137-148: Wrap the ISBN-path in GoodReadsParser so the call to
fetchDoc(BASE_ISBN_URL + isbn) and the subsequent
extractGoodreadsIdFromOgUrl/parseBookDetails logic are enclosed in a try-catch
that catches RuntimeException (the same way fetchTopMetadata does); on exception
log the error (including the ISBN and exception) and allow the method to
continue so it falls back to the preview/search flow instead of propagating the
RuntimeException. Ensure you reference the existing helper methods
extractGoodreadsIdFromOgUrl and parseBookDetails when adding the try-catch so
behavior remains identical on success.

In `@CHANGELOG.md`:
- Around line 1-12: Move the top-level heading "# Changelog" to the very first
line of the file so the file begins with that heading, followed by a single
blank line and then the existing release entries (starting with "##
[2.2.2]..."). Ensure the "# Changelog" line is removed from its current location
and placed above the "## [2.2.2]" section so tooling and readability are
consistent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b3956752-f3e5-4e24-ad0c-3ef34a067cb8

📥 Commits

Reviewing files that changed from the base of the PR and between 926088c and d0150e1.

📒 Files selected for processing (3)
  • CHANGELOG.md
  • booklore-api/src/main/java/org/booklore/service/bookdrop/BookdropMetadataService.java
  • booklore-api/src/main/java/org/booklore/service/metadata/parser/GoodReadsParser.java

@imajes imajes force-pushed the develop branch 2 times, most recently from 89113d4 to 37ca101 Compare March 20, 2026 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants