feat: render LaTeX for LLMs by Moskize91 · Pull Request #121 · oomol-lab/epub-translator

Moskize91 · 2026-02-04T10:39:49Z

No description provided.

coderabbitai · 2026-02-04T10:40:36Z

Caution

Review failed

The head commit changed during the review from 7b2c557 to 0d5d77d.

Walkthrough

This pull request removes the XSLT-based MathML→LaTeX conversion assets under epub_translator/data/mmltex (multiple XSL files and a README), deletes the utility function unwrap_parents from epub_translator/segment/utils.py, adds mathml2latex as a dependency in pyproject.toml, and updates epub_translator/translation/xml_interrupter.py to render buffered MathML as LaTeX via a new _render_latex helper that uses mathml2latex and BeautifulSoup.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

oomol-lab/epub-translator PR 54: Also modifies MathML→LaTeX conversion support and touches the same mmltex XSLT assets.
oomol-lab/epub-translator PR 120: Also changes epub_translator/translation/xml_interrupter.py and buffered math/text merging behavior.

🚥 Pre-merge checks | ✅ 1 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	No description was provided by the author, making it impossible to assess whether the description relates to the changeset.	Add a pull request description explaining the motivation for rendering LaTeX, implementation details, and any related context about MathML removal.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title follows the required format `<type>(<scope>): <subject>` with 'feat' as type and 'render LaTeX for LLMs' as subject, and accurately describes the main change (adding LaTeX rendering capability).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@epub_translator/translation/xml_interrupter.py`:
- Around line 153-154: The MathML string mathml_str is pure XML and is being
parsed with BeautifulSoup using the HTML parser; update the BeautifulSoup call
that creates soup so it uses an XML parser (e.g., "xml" or "lxml-xml") instead
of "html.parser" to preserve namespaces and proper XML handling before passing
soup into process_mathml.

🧹 Nitpick comments (2)

pyproject.toml (1)
36-36: Inconsistent dependency format with parentheses.

The other dependencies in this file use the standard format without parentheses (e.g., "tiktoken>=0.12.0,<1.0.0"). For consistency, consider removing the parentheses and spaces.
♻️ Suggested fix
-    "mathml2latex (>=0.2.12,<0.3.0)",
+    "mathml2latex>=0.2.12,<0.3.0",
epub_translator/translation/xml_interrupter.py (1)
151-156: Consider logging the exception for debugging purposes.

Silently swallowing exceptions with except Exception: pass makes it difficult to diagnose issues when LaTeX conversion fails. While the fallback to raw text is reasonable, logging would help identify problematic MathML inputs.
🔧 Suggested improvement
+import logging
+
+logger = logging.getLogger(__name__)
+
 # ... in _render_latex method:
         try:
             mathml_str = tostring(math_element, encoding="unicode")
             soup = BeautifulSoup(mathml_str, "html.parser")
             latex = process_mathml(soup)
-        except Exception:
-            pass
+        except Exception as e:
+            logger.debug("Failed to convert MathML to LaTeX: %s", e)

epub_translator/translation/xml_interrupter.py

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@epub_translator/translation/xml_interrupter.py`:
- Around line 160-167: The inline/block detection currently treats a missing
display attribute as block because it checks only for display == "inline";
update the logic in the function handling math elements (the code that uses
math_element.get("display", None), latex and text_segments) to explicitly treat
"block" as the only block case — i.e., if math_element.get("display") == "block"
use block delimiters ($$...$$), otherwise use inline delimiters ($...$) so that
omitted display defaults to inline per the MathML spec.

🧹 Nitpick comments (1)

epub_translator/translation/xml_interrupter.py (1)
153-158: Avoid silently swallowing exceptions.

The bare except Exception: pass makes debugging difficult when conversion fails. At minimum, log the exception to aid troubleshooting.
🛠️ Suggested improvement
+import logging
+
+_logger = logging.getLogger(__name__)
+
 # ... in _render_latex method:
         try:
             mathml_str = tostring(math_element, encoding="unicode")
             soup = BeautifulSoup(mathml_str, "html.parser")
             latex = process_mathml(soup)
-        except Exception:
-            pass
+        except Exception as e:
+            _logger.debug("MathML to LaTeX conversion failed: %s", e)

coderabbitai · 2026-02-04T10:53:28Z

epub_translator/translation/xml_interrupter.py

+        if latex is None:
+            latex = "".join(t.text for t in text_segments)
+        elif math_element.get("display", None) == "inline":
+            latex = f"${latex}$"
+        else:
+            latex = f"$${latex}$$"
+
+        return f" {latex} "


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

MathML math element display attribute default value block inline

💡 Result:

For the MathML <math> element, the display attribute accepts inline or block, and if the attribute is omitted the default is inline. [1][7]

🏁 Script executed:

cat -n epub_translator/translation/xml_interrupter.py | sed -n '155,175p'

Repository: oomol-lab/epub-translator

Length of output: 898

🏁 Script executed:

# Search for other MathML handling in the codebase to understand the full context rg -n "display" epub_translator/translation/xml_interrupter.py

Repository: oomol-lab/epub-translator

Length of output: 336

🏁 Script executed:

# Check if there are test cases or other context about how math elements are processed rg -n "math_element" epub_translator/translation/xml_interrupter.py | head -20

Repository: oomol-lab/epub-translator

Length of output: 559

Correct the inline math detection logic.

The code incorrectly treats missing display attributes as block math. Per the MathML specification, when the display attribute is omitted, it defaults to inline. The current logic checks display == "inline" explicitly; when the attribute is absent, get() returns None, which fails the condition and incorrectly wraps the math in $$...$$ (block delimiters) instead of $...$ (inline delimiters).

Invert the logic to check for "block" explicitly:

Suggested fix

- elif math_element.get("display", None) == "inline": - latex = f"${latex}$" - else: + elif math_element.get("display", None) == "block": latex = f"$${latex}$$" + else: + latex = f"${latex}$"

🤖 Prompt for AI Agents

In `@epub_translator/translation/xml_interrupter.py` around lines 160 - 167, The inline/block detection currently treats a missing display attribute as block because it checks only for display == "inline"; update the logic in the function handling math elements (the code that uses math_element.get("display", None), latex and text_segments) to explicitly treat "block" as the only block case — i.e., if math_element.get("display") == "block" use block delimiters ($$...$$), otherwise use inline delimiters ($...$) so that omitted display defaults to inline per the MathML spec.

Moskize91 added 2 commits February 4, 2026 18:34

feat: render LaTeX

24522e9

refactor: remove useless files

46578a6

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

epub_translator/translation/xml_interrupter.py Show resolved Hide resolved

epub_translator/translation/xml_interrupter.py Show resolved Hide resolved

fix: return empty string if don't find math

0d5d77d

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

Moskize91 force-pushed the expr branch from 7b2c557 to 0d5d77d Compare February 4, 2026 10:58

Moskize91 merged commit d58cc96 into main Feb 4, 2026
3 checks passed

Moskize91 deleted the expr branch February 4, 2026 10:59

This was referenced Feb 4, 2026

docs: add chagelog #122

Merged

fix: <math> without display=block should be recognized as a inline tag #123

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: render LaTeX for LLMs#121

feat: render LaTeX for LLMs#121
Moskize91 merged 3 commits intomainfrom
expr

Moskize91 commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Review failed

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Moskize91 commented Feb 4, 2026

Uh oh!

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Feb 4, 2026 •

edited

Loading