Skip to content

fix: should keep namespace structure of toc & opf#127

Merged
Moskize91 merged 5 commits intomainfrom
html
Feb 5, 2026
Merged

fix: should keep namespace structure of toc & opf#127
Moskize91 merged 5 commits intomainfrom
html

Conversation

@Moskize91
Copy link
Contributor

No description provided.

@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

Summary by CodeRabbit

  • Refactor

    • Metadata and TOC flows now return and propagate context objects through read/write and translation task flows for safer in-place updates.
  • Bug Fixes

    • Namespace-agnostic XML parsing with resilient handling when elements are missing.
    • OPF is no longer treated as a self-closing/void element.
    • Serialization fixes avoid incorrect prefixing of standard HTML attributes on link elements.
  • Chores

    • New context types exposed in the package public API; tests updated accordingly.

Walkthrough

The PR exposes MetadataContext and TocContext from the epub package initializer and adds corresponding dataclasses. read_metadata/read_toc now return (items, context) tuples and write_metadata/write_toc accept those contexts to perform in-place updates using XMLLikeNode. translator.py threads toc and metadata contexts through task generation and conditionally invokes write-back when contexts exist. xml/self_closing.py removes "meta" from the void tag set. xml/xml_like.py applies regex-based fixes to revert epub-prefixed link attributes to standard HTML attribute names. Tests updated to match new signatures and behaviors.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

🚥 Pre-merge checks | ✅ 1 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description check ❓ Inconclusive No description was provided by the author, making it impossible to assess whether the description relates to the changeset. Add a pull request description explaining the changes, why they were necessary, and how they address the namespace structure preservation for TOC and OPF files.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title follows the required format <type>(<scope>): <subject> with type 'fix' and subject describing keeping namespace structure of TOC and OPF files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@epub_translator/epub/metadata.py`:
- Around line 15-18: The comment for the MetadataContext dataclass uses a
fullwidth comma causing RUF003; update the inline comment for the xml_node field
to replace the fullwidth comma with a standard ASCII comma (i.e., change
"XMLLikeNode 对象,保留原始文件信息" to use a normal comma) so the comment for the opf_path
and xml_node fields is unambiguous and Ruff-compliant; locate the
MetadataContext dataclass and edit the comment on the xml_node field
accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
epub_translator/xml/self_closing.py (2)

165-168: ⚠️ Potential issue | 🟡 Minor

Docstring example still mentions <meta> despite exclusion.

unclose_void_elements no longer processes <meta> because it’s not in _VOID_TAGS, so the example is now misleading.

📝 Suggested docstring fix
-        <meta charset="utf-8" /> → <meta charset="utf-8">
         <br /> → <br>
         <img src="test.png" /> → <img src="test.png">

6-45: ⚠️ Potential issue | 🟠 Major

Inconsistency: <meta> should be self-closed for XML parsing, but it's excluded from _VOID_TAGS.

The tests expect <meta charset="utf-8"><meta charset="utf-8" /> (lines 16, 113, 372-374), but self_close_void_elements won't process <meta> tags because they're missing from _VOID_TAGS. When this function is called before XML parsing in xml_like.py:49, non-self-closed <meta> tags will cause parsing failures.

The comment states "meta is excluded because OPF files have <meta property="...">content</meta>" (a non-void element), but this conflates two different contexts: OPF metadata documents vs. HTML/XHTML content. Consider making <meta> handling context-aware or restoring it to _VOID_TAGS if only XHTML content is processed here.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@epub_translator/xml/xml_like.py`:
- Around line 224-226: The current unconditional loop that applies
_STANDARD_HTML_ATTRS to xml_string (the pattern.sub(...) block) removes
epub:type/epub:rel from all <link> elements; restrict this rewrite so it only
runs for Package Documents/container.xml contexts or when a caller-provided flag
indicates it's safe. Update the code that invokes the xml_string normalization
(or the function containing the loop) to accept a context parameter (e.g.,
"package" vs "content") or a boolean (e.g., allow_epub_prefixes) and only
execute the for pattern, replacement in _STANDARD_HTML_ATTRS: xml_string =
pattern.sub(replacement, xml_string) when the context is package/container or
the flag is true; alternatively, preserve attributes by tracking original
attribute namespaces and skip replacing attributes that originated from EPUB
Content Documents. Ensure you reference the _STANDARD_HTML_ATTRS list and the
xml_string normalization loop when making the change.
🧹 Nitpick comments (1)
epub_translator/xml/xml_like.py (1)

35-52: Add a regression test for the <link> attribute workaround.

Given the regex-based post‑processing in Line 35-52, a small test covering <link type="text/css" rel="stylesheet"> (and a control element like <nav epub:type="toc">) would lock in the intended behavior and reduce regressions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/test_metadata.py (1)

16-77: ⚠️ Potential issue | 🟠 Major

Fix unused context variables to clear lint failures.
Pylint flags unused-variable at Line 22, Line 58, and Line 69. Rename to _context to keep CI green.

🔧 Proposed fix
-            metadata, context = read_metadata(zip_file)
+            metadata, _context = read_metadata(zip_file)
...
-            metadata, context = read_metadata(zip_file)
+            metadata, _context = read_metadata(zip_file)
...
-            metadata, context = read_metadata(zip_file)
+            metadata, _context = read_metadata(zip_file)

@Moskize91 Moskize91 merged commit 94040ec into main Feb 5, 2026
2 checks passed
@Moskize91 Moskize91 deleted the html branch February 5, 2026 08:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant