Skip to content

fix(archive): infer archive type via Magic Numbers instead of filename#79

Open
balazs-szucs wants to merge 8 commits intogrimmory-tools:developfrom
balazs-szucs:archive-type-detection
Open

fix(archive): infer archive type via Magic Numbers instead of filename#79
balazs-szucs wants to merge 8 commits intogrimmory-tools:developfrom
balazs-szucs:archive-type-detection

Conversation

@balazs-szucs
Copy link
Member

@balazs-szucs balazs-szucs commented Mar 20, 2026

📝 Description

This pull request refactors archive type detection throughout the codebase to consistently use content-based detection (via ArchiveUtils.detectArchiveType) instead of relying on file extensions. This improves robustness when handling comic book archives (CBZ, CBR, CB7), ensures correct MIME type assignment, and simplifies related code. The PR also updates tests and the development Docker setup for improved reliability and maintainability.

Simply put, fixes bugs where Reader or parts of the codebase would fail on Archieve where the underlying archive type vs filename were inconsistent

Required for develop and main. Your PR title must use Conventional Commit format because maintainers squash-merge with the PR title and stable releases are computed from commit history. Example: fix(reader): prevent blank pages on chapter jump

Linked Issue: Fixes #

Required. Every PR must reference an approved issue. If no issue exists, open one and wait for maintainer approval before submitting a PR. Unsolicited PRs without a linked issue will be closed.

🏷️ Type of Change

  • Bug fix
  • New feature
  • Enhancement to existing feature
  • Refactor (no behavior change)
  • Breaking change (existing functionality affected)
  • Documentation update

🔧 Changes

🧪 Testing (MANDATORY)

PRs without this section filled out will be closed. "Tests pass" or "Tested locally" is not sufficient. You must provide specifics.

Manual testing steps you performed:

Regression testing:

Edge cases covered:

Test output:

Backend test output (./gradlew test)
PASTE OUTPUT HERE
Frontend test output (ng test)
PASTE OUTPUT HERE

📸 Screen Recording / Screenshots (MANDATORY)

Every PR must include a screen recording or screenshots showing the change working end-to-end in a running local instance (both backend and frontend). This means you must have actually built, run, and tested the code yourself. PRs without visual proof will be closed without review.


✅ Pre-Submission Checklist

All boxes must be checked before requesting review. Incomplete PRs will be closed without review. No exceptions.

  • This PR is linked to an approved issue
  • Code follows project backend and frontend conventions
  • Branch is up to date with develop (merge conflicts resolved)
  • I ran the full stack locally (backend + frontend + database) and verified the change works
  • Automated tests added or updated to cover changes (backend and frontend)
  • All tests pass locally and output is pasted above
  • Screen recording or screenshots are attached above proving the change works
  • PR is a single focused change (one bug fix OR one feature, not multiple unrelated changes)
  • PR is reasonably scoped (PRs over 1000+ changed lines will be closed, split into smaller PRs)
  • No unsolicited refactors, cleanups, or "improvements" are bundled in
  • Flyway migration versioning is correct (if schema was modified)
  • Required documentation updates are included in this repo or the current Grimmory docs surface (if user-facing changes)

🤖 AI-Assisted Contributions

If any part of this PR was generated or assisted by AI tools (Copilot, Claude, ChatGPT, etc.), all items below are mandatory. You are fully responsible for every line you submit. "The AI wrote it" is not an excuse, and AI-generated PRs that clearly haven't been reviewed are the #1 reason PRs get closed.

  • I have read and understand every line of this PR and can explain any part of it during review
  • I personally ran the code and verified it works (not just trusted the AI's output)
  • PR is scoped to a single logical change, not a dump of everything the AI suggested
  • Tests validate actual behavior, not just coverage (AI-generated tests often assert nothing meaningful)
  • No dead code, placeholder comments, TODOs, or unused scaffolding left behind by AI
  • I did not submit refactors, style changes, or "improvements" the AI suggested beyond the scope of the issue

💬 Additional Context (optional)

Summary by CodeRabbit

  • Bug Fixes

    • Enhanced comic archive format detection to identify file types based on actual archive contents rather than filename extensions for more reliable support.
  • Chores

    • Updated CI/CD pipeline configurations and documentation.
    • Improved Docker environment with enhanced archive handling support.

dependabot bot and others added 6 commits March 19, 2026 19:46
Dependabot couldn't find the original pull request head commit, ea510f4.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…immory-tools#2)

Dependabot couldn't find the original pull request head commit, faed6bf.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…mory-tools#3)

Dependabot couldn't find the original pull request head commit, f110823.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ools#6)

Dependabot couldn't find the original pull request head commit, 9a8d7a1.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

This pull request refactors archive-type detection across comic book services from filename-extension-based checks to using ArchiveUtils.detectArchiveType(). Changes include removing the public isSupportedCbxFormat method, updating MIME type detection, enhancing Docker build configuration with unrar support, and aligning tests with the new detection logic.

Changes

Cohort / File(s) Summary
Archive Type Detection Refactoring
booklore-api/src/main/java/org/booklore/service/kobo/CbxConversionService.java, booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java, booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java
Replaced filename-extension-based validation with ArchiveUtils.detectArchiveType() for archive detection. Removed public isSupportedCbxFormat(String fileName) method. Updated control flows to switch on detected ArchiveType instead of suffix checks.
Test Suite Updates
booklore-api/src/test/java/org/booklore/service/kobo/CbxConversionServiceTest.java, booklore-api/src/test/java/org/booklore/service/opds/OpdsFeedServiceMimeTypeTest.java, booklore-api/src/test/java/org/booklore/service/opds/OpdsFeedServiceTest.java
Removed tests for deleted isSupportedCbxFormat method. Updated MIME type tests to write real temporary files with proper magic bytes (RAR 4.x, 7z). Changed .cbt expected MIME type to application/vnd.comicbook+zip. Updated archive type from UNKNOWN to RAR in test setup.
Build & Configuration
dev.docker-compose.yml, CHANGELOG.md
Replaced prebuilt Gradle image with inline multi-stage Docker build that includes unrar support via linuxserver/unrar:7.1.10. Added libstdc++ and libgcc runtime dependencies. Added changelog entry for version 2.2.2 documenting CI permission fixes and release pipeline updates.
Log Message Update
booklore-api/src/main/java/org/booklore/service/fileprocessor/CbxProcessor.java
Updated CBX cover-generation warning message to reference archive type generically ('{}' archive) instead of hardcoded type reference ('{}' CBZ file).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hopping through archives with types so refined,
No more extension tricks left behind,
From .cbr to .cb7, now magic bytes lead the way,
With unrar's strength in the Docker to play,
Detection springs forth—accurate, clean, and bright! ✨📚

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description is substantially incomplete. It lacks a linked issue reference (required), no checkboxes are marked for type of change, the Changes section is empty, and the critical Testing section with manual testing steps, regression testing, edge cases, and test output is entirely unfilled. The pre-submission checklist is not completed. Complete all required sections: add the linked issue number in 'Fixes #', mark the appropriate change type checkbox, list specific changes made, provide detailed manual testing steps with actual test output, include regression testing results, describe edge cases tested, and check all pre-submission checklist items before requesting review.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: switching from filename-based to magic-number-based archive type detection.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
📝 Coding Plan
  • Generate coding plan for human review comments

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java (1)

318-324: Consider caching the detected archive type to avoid redundant detection.

The archive type is detected again here even though it was already determined during scanArchiveMetadata(). Since CachedArchiveMetadata is already passed to this method, consider storing the ArchiveType in the cache to avoid re-reading the file's magic bytes on every page stream.

♻️ Optional: Cache archive type in metadata
 private static class CachedArchiveMetadata {
     final List<String> imageEntries;
     final long lastModified;
     final Charset successfulEncoding;
     final boolean useUnicodeExtraFields;
+    final ArchiveUtils.ArchiveType archiveType;
     volatile long lastAccessed;

-    CachedArchiveMetadata(List<String> imageEntries, long lastModified, Charset successfulEncoding, boolean useUnicodeExtraFields) {
+    CachedArchiveMetadata(List<String> imageEntries, long lastModified, Charset successfulEncoding, boolean useUnicodeExtraFields, ArchiveUtils.ArchiveType archiveType) {
         this.imageEntries = List.copyOf(imageEntries);
         this.lastModified = lastModified;
         this.successfulEncoding = successfulEncoding;
         this.useUnicodeExtraFields = useUnicodeExtraFields;
+        this.archiveType = archiveType;
         this.lastAccessed = System.currentTimeMillis();
     }
 }

Then use metadata.archiveType in streamEntryFromArchive() instead of re-detecting.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java`
around lines 318 - 324, The code redundantly re-detects the archive type using
ArchiveUtils.detectArchiveType(cbxPath.toFile()) in CbxReaderService (inside the
streamEntryFromArchive / switch block); update the caching logic so
scanArchiveMetadata() stores the detected ArchiveUtils.ArchiveType in
CachedArchiveMetadata (e.g., metadata.archiveType) and change the switch to use
metadata.archiveType instead of calling detectArchiveType again; ensure
CachedArchiveMetadata is populated when initially scanning and add null/unknown
handling in streamEntryFromArchive to fall back to detection only if
metadata.archiveType is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Around line 1-11: The changelog entry for 2.2.2 is incorrect/missing the
archive-detection changes introduced in this PR; update CHANGELOG.md to either
(A) add a new release section (e.g., 2.2.3 with date) describing the bugfixes
and improvements for ArchiveUtils.detectArchiveType(), comic book archive
handling (CBZ, CBR, CB7) and MIME/magic-number based detection, or (B) replace
the 2.2.2 entry content with those detailed changes if this PR is intended to be
part of 2.2.2; mention the specific symbols and features changed
(ArchiveUtils.detectArchiveType, CBZ/CBR/CB7 handling, MIME type
detection/magic-number logic) and include concise bullet points summarizing the
fixes and any user-facing behavior changes.

---

Nitpick comments:
In
`@booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java`:
- Around line 318-324: The code redundantly re-detects the archive type using
ArchiveUtils.detectArchiveType(cbxPath.toFile()) in CbxReaderService (inside the
streamEntryFromArchive / switch block); update the caching logic so
scanArchiveMetadata() stores the detected ArchiveUtils.ArchiveType in
CachedArchiveMetadata (e.g., metadata.archiveType) and change the switch to use
metadata.archiveType instead of calling detectArchiveType again; ensure
CachedArchiveMetadata is populated when initially scanning and add null/unknown
handling in streamEntryFromArchive to fall back to detection only if
metadata.archiveType is missing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c14b689f-cc47-4d9c-a755-241245805ab4

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef4448 and f5986d5.

📒 Files selected for processing (9)
  • CHANGELOG.md
  • booklore-api/src/main/java/org/booklore/service/fileprocessor/CbxProcessor.java
  • booklore-api/src/main/java/org/booklore/service/kobo/CbxConversionService.java
  • booklore-api/src/main/java/org/booklore/service/opds/OpdsFeedService.java
  • booklore-api/src/main/java/org/booklore/service/reader/CbxReaderService.java
  • booklore-api/src/test/java/org/booklore/service/kobo/CbxConversionServiceTest.java
  • booklore-api/src/test/java/org/booklore/service/opds/OpdsFeedServiceMimeTypeTest.java
  • booklore-api/src/test/java/org/booklore/service/opds/OpdsFeedServiceTest.java
  • dev.docker-compose.yml
💤 Files with no reviewable changes (1)
  • booklore-api/src/test/java/org/booklore/service/kobo/CbxConversionServiceTest.java

@imajes imajes force-pushed the develop branch 2 times, most recently from 89113d4 to 37ca101 Compare March 20, 2026 22:20
# Conflicts:
#	.github/workflows/preview-image.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant