fix(archive): infer archive type via Magic Numbers instead of filename#79
fix(archive): infer archive type via Magic Numbers instead of filename#79balazs-szucs wants to merge 8 commits intogrimmory-tools:developfrom
Conversation
Dependabot couldn't find the original pull request head commit, ea510f4. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…immory-tools#2) Dependabot couldn't find the original pull request head commit, faed6bf. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…mory-tools#3) Dependabot couldn't find the original pull request head commit, f110823. Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…cover image retrieval
📝 WalkthroughWalkthroughThis pull request refactors archive-type detection across comic book services from filename-extension-based checks to using Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java (1)
318-324: Consider caching the detected archive type to avoid redundant detection.The archive type is detected again here even though it was already determined during
scanArchiveMetadata(). SinceCachedArchiveMetadatais already passed to this method, consider storing theArchiveTypein the cache to avoid re-reading the file's magic bytes on every page stream.♻️ Optional: Cache archive type in metadata
private static class CachedArchiveMetadata { final List<String> imageEntries; final long lastModified; final Charset successfulEncoding; final boolean useUnicodeExtraFields; + final ArchiveUtils.ArchiveType archiveType; volatile long lastAccessed; - CachedArchiveMetadata(List<String> imageEntries, long lastModified, Charset successfulEncoding, boolean useUnicodeExtraFields) { + CachedArchiveMetadata(List<String> imageEntries, long lastModified, Charset successfulEncoding, boolean useUnicodeExtraFields, ArchiveUtils.ArchiveType archiveType) { this.imageEntries = List.copyOf(imageEntries); this.lastModified = lastModified; this.successfulEncoding = successfulEncoding; this.useUnicodeExtraFields = useUnicodeExtraFields; + this.archiveType = archiveType; this.lastAccessed = System.currentTimeMillis(); } }Then use
metadata.archiveTypeinstreamEntryFromArchive()instead of re-detecting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java` around lines 318 - 324, The code redundantly re-detects the archive type using ArchiveUtils.detectArchiveType(cbxPath.toFile()) in CbxReaderService (inside the streamEntryFromArchive / switch block); update the caching logic so scanArchiveMetadata() stores the detected ArchiveUtils.ArchiveType in CachedArchiveMetadata (e.g., metadata.archiveType) and change the switch to use metadata.archiveType instead of calling detectArchiveType again; ensure CachedArchiveMetadata is populated when initially scanning and add null/unknown handling in streamEntryFromArchive to fall back to detection only if metadata.archiveType is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 1-11: The changelog entry for 2.2.2 is incorrect/missing the
archive-detection changes introduced in this PR; update CHANGELOG.md to either
(A) add a new release section (e.g., 2.2.3 with date) describing the bugfixes
and improvements for ArchiveUtils.detectArchiveType(), comic book archive
handling (CBZ, CBR, CB7) and MIME/magic-number based detection, or (B) replace
the 2.2.2 entry content with those detailed changes if this PR is intended to be
part of 2.2.2; mention the specific symbols and features changed
(ArchiveUtils.detectArchiveType, CBZ/CBR/CB7 handling, MIME type
detection/magic-number logic) and include concise bullet points summarizing the
fixes and any user-facing behavior changes.
---
Nitpick comments:
In
`@booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java`:
- Around line 318-324: The code redundantly re-detects the archive type using
ArchiveUtils.detectArchiveType(cbxPath.toFile()) in CbxReaderService (inside the
streamEntryFromArchive / switch block); update the caching logic so
scanArchiveMetadata() stores the detected ArchiveUtils.ArchiveType in
CachedArchiveMetadata (e.g., metadata.archiveType) and change the switch to use
metadata.archiveType instead of calling detectArchiveType again; ensure
CachedArchiveMetadata is populated when initially scanning and add null/unknown
handling in streamEntryFromArchive to fall back to detection only if
metadata.archiveType is missing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c14b689f-cc47-4d9c-a755-241245805ab4
📒 Files selected for processing (9)
CHANGELOG.mdbooklore-api/src/main/java/org/booklore/service/fileprocessor/CbxProcessor.javabooklore-api/src/main/java/org/booklore/service/kobo/CbxConversionService.javabooklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.javabooklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.javabooklore-api/src/test/java/org/booklore/service/kobo/CbxConversionServiceTest.javabooklore-api/src/test/java/org/booklore/service/opds/OpdsFeedServiceMimeTypeTest.javabooklore-api/src/test/java/org/booklore/service/opds/OpdsFeedServiceTest.javadev.docker-compose.yml
💤 Files with no reviewable changes (1)
- booklore-api/src/test/java/org/booklore/service/kobo/CbxConversionServiceTest.java
89113d4 to
37ca101
Compare
# Conflicts: # .github/workflows/preview-image.yml
📝 Description
This pull request refactors archive type detection throughout the codebase to consistently use content-based detection (via
ArchiveUtils.detectArchiveType) instead of relying on file extensions. This improves robustness when handling comic book archives (CBZ, CBR, CB7), ensures correct MIME type assignment, and simplifies related code. The PR also updates tests and the development Docker setup for improved reliability and maintainability.Simply put, fixes bugs where Reader or parts of the codebase would fail on Archieve where the underlying archive type vs filename were inconsistent
Linked Issue: Fixes #
🏷️ Type of Change
🔧 Changes
🧪 Testing (MANDATORY)
Manual testing steps you performed:
Regression testing:
Edge cases covered:
Test output:
Backend test output (
./gradlew test)Frontend test output (
ng test)📸 Screen Recording / Screenshots (MANDATORY)
✅ Pre-Submission Checklist
develop(merge conflicts resolved)🤖 AI-Assisted Contributions
TODOs, or unused scaffolding left behind by AI💬 Additional Context (optional)
Summary by CodeRabbit
Bug Fixes
Chores