fix: should keep namespace structure of toc & opf by Moskize91 · Pull Request #127 · oomol-lab/epub-translator

Moskize91 · 2026-02-05T07:29:59Z

No description provided.

coderabbitai · 2026-02-05T07:30:18Z

Summary by CodeRabbit

Refactor
- Metadata and TOC flows now return and propagate context objects through read/write and translation task flows for safer in-place updates.
Bug Fixes
- Namespace-agnostic XML parsing with resilient handling when elements are missing.
- OPF is no longer treated as a self-closing/void element.
- Serialization fixes avoid incorrect prefixing of standard HTML attributes on link elements.
Chores
- New context types exposed in the package public API; tests updated accordingly.

Walkthrough

The PR exposes MetadataContext and TocContext from the epub package initializer and adds corresponding dataclasses. read_metadata/read_toc now return (items, context) tuples and write_metadata/write_toc accept those contexts to perform in-place updates using XMLLikeNode. translator.py threads toc and metadata contexts through task generation and conditionally invokes write-back when contexts exist. xml/self_closing.py removes "meta" from the void tag set. xml/xml_like.py applies regex-based fixes to revert epub-prefixed link attributes to standard HTML attribute names. Tests updated to match new signatures and behaviors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

feat: support metadata & toc translation #57 — Overlaps changes to translator flow and TOC/metadata handling, touching the same modules.
feat: implement new toc & spines logic #43 — Makes related edits to epub TOC/spine APIs and package exports that intersect with this PR.
feat: use XML transcode logic & merge toc, metadatas into XML serials #90 — Earlier work on metadata/TOC read/write surfaces that this context-based refactor builds upon.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	No description was provided by the author, making it impossible to assess whether the description relates to the changeset.	Add a pull request description explaining the changes, why they were necessary, and how they address the namespace structure preservation for TOC and OPF files.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows the required format `<type>(<scope>): <subject>` with type 'fix' and subject describing keeping namespace structure of TOC and OPF files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@epub_translator/epub/metadata.py`:
- Around line 15-18: The comment for the MetadataContext dataclass uses a
fullwidth comma causing RUF003; update the inline comment for the xml_node field
to replace the fullwidth comma with a standard ASCII comma (i.e., change
"XMLLikeNode 对象，保留原始文件信息" to use a normal comma) so the comment for the opf_path
and xml_node fields is unambiguous and Ruff-compliant; locate the
MetadataContext dataclass and edit the comment on the xml_node field
accordingly.

epub_translator/epub/metadata.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

epub_translator/xml/self_closing.py (2)
165-168: ⚠️ Potential issue | 🟡 Minor

Docstring example still mentions <meta> despite exclusion.

unclose_void_elements no longer processes <meta> because it’s not in _VOID_TAGS, so the example is now misleading.
📝 Suggested docstring fix
-        <meta charset="utf-8" /> → <meta charset="utf-8">
         <br /> → <br>
         <img src="test.png" /> → <img src="test.png">
6-45: ⚠️ Potential issue | 🟠 Major

Inconsistency: <meta> should be self-closed for XML parsing, but it's excluded from _VOID_TAGS.

The tests expect <meta charset="utf-8"> → <meta charset="utf-8" /> (lines 16, 113, 372-374), but self_close_void_elements won't process <meta> tags because they're missing from _VOID_TAGS. When this function is called before XML parsing in xml_like.py:49, non-self-closed <meta> tags will cause parsing failures.

The comment states "meta is excluded because OPF files have <meta property="...">content</meta>" (a non-void element), but this conflates two different contexts: OPF metadata documents vs. HTML/XHTML content. Consider making <meta> handling context-aware or restoring it to _VOID_TAGS if only XHTML content is processed here.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@epub_translator/xml/xml_like.py`:
- Around line 224-226: The current unconditional loop that applies
_STANDARD_HTML_ATTRS to xml_string (the pattern.sub(...) block) removes
epub:type/epub:rel from all <link> elements; restrict this rewrite so it only
runs for Package Documents/container.xml contexts or when a caller-provided flag
indicates it's safe. Update the code that invokes the xml_string normalization
(or the function containing the loop) to accept a context parameter (e.g.,
"package" vs "content") or a boolean (e.g., allow_epub_prefixes) and only
execute the for pattern, replacement in _STANDARD_HTML_ATTRS: xml_string =
pattern.sub(replacement, xml_string) when the context is package/container or
the flag is true; alternatively, preserve attributes by tracking original
attribute namespaces and skip replacing attributes that originated from EPUB
Content Documents. Ensure you reference the _STANDARD_HTML_ATTRS list and the
xml_string normalization loop when making the change.

🧹 Nitpick comments (1)

epub_translator/xml/xml_like.py (1)

35-52: Add a regression test for the <link> attribute workaround.

Given the regex-based post‑processing in Line 35-52, a small test covering <link type="text/css" rel="stylesheet"> (and a control element like <nav epub:type="toc">) would lock in the intended behavior and reduce regressions.

epub_translator/xml/xml_like.py

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/test_metadata.py (1)

16-77: ⚠️ Potential issue | 🟠 Major

Fix unused context variables to clear lint failures.
Pylint flags unused-variable at Line 22, Line 58, and Line 69. Rename to _context to keep CI green.

🔧 Proposed fix

-            metadata, context = read_metadata(zip_file)
+            metadata, _context = read_metadata(zip_file)
...
-            metadata, context = read_metadata(zip_file)
+            metadata, _context = read_metadata(zip_file)
...
-            metadata, context = read_metadata(zip_file)
+            metadata, _context = read_metadata(zip_file)

fix: should keep namespace structure of toc & opf

27b3411

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

epub_translator/epub/metadata.py Show resolved Hide resolved

fix: will clear body of meta

8179f11

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

fix: generate <link etag:type ...>

fa467d6

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

epub_translator/xml/xml_like.py Show resolved Hide resolved

Moskize91 added 2 commits February 5, 2026 16:35

test: sync test units

1f025ed

style: fix pylint errors

7192827

coderabbitai bot reviewed Feb 5, 2026

View reviewed changes

Moskize91 merged commit 94040ec into main Feb 5, 2026
2 checks passed

Moskize91 deleted the html branch February 5, 2026 08:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: should keep namespace structure of toc & opf#127

fix: should keep namespace structure of toc & opf#127
Moskize91 merged 5 commits intomainfrom
html

Moskize91 commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Moskize91 commented Feb 5, 2026

Uh oh!

coderabbitai bot commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Walkthrough

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 5, 2026 •

edited

Loading