Skip to content

DOCX: harden Typst codegen with centralized syntax-safe escaping #108

@developer0hye

Description

@developer0hye

Summary

DOCX -> Typst code generation must be syntax-safe regardless of input text/style content.

Problem

Current code paths can emit Typst markup that becomes syntactically invalid for certain character combinations or run compositions. This causes hard conversion failures instead of graceful degradation.

Proposal

  1. Introduce a single, shared escaping/sanitization layer for Typst text emission.
  • Central helper for plain text and inline content.
  • Explicit handling for reserved characters, escaping boundaries, and control characters.
  1. Remove ad-hoc string interpolation in emitters.
  • Replace direct format! insertion of untrusted text with typed emit helpers.
  1. Add guardrails for token construction.
  • Avoid constructing ambiguous tokens where text and units/symbols can merge into invalid syntax.
  1. Expand tests.
  • Add focused unit tests for escaping edge cases.
  • Add regression tests for previously observed syntax-failure patterns.

Code Areas

  • crates/office2pdf/src/render/typst_gen.rs
  • crates/office2pdf/src/parser/docx.rs
  • crates/office2pdf/tests/ (new regression coverage)

Acceptance Criteria

  • Known syntax-failure patterns now compile successfully.
  • New escaping helper is used by all DOCX text-emitting paths.
  • Tests fail if any emitter bypasses the shared helper.
  • No private/internal fixture content is copied into repository tests.

Non-Goals

  • This issue does not target layout fidelity improvements by itself.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions