Skip to content

Feat: export office2typst lib#134

Open
kitty-eu-org wants to merge 1 commit intodeveloper0hye:mainfrom
kitty-eu-org:lib
Open

Feat: export office2typst lib#134
kitty-eu-org wants to merge 1 commit intodeveloper0hye:mainfrom
kitty-eu-org:lib

Conversation

@kitty-eu-org
Copy link
Copy Markdown

No description provided.

Signed-off-by: hezz <hezz@example.com>
Copy link
Copy Markdown
Owner

@developer0hye developer0hye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — PR #134

Thanks for the contribution. The idea of exposing OOXML→IR→Typst without forcing PDF compilation is interesting, but this PR has significant architectural and quality issues that need to be addressed.


CRITICAL

1. Massive code duplication from office2pdf::lib.rs

File: crates/office2typst/src/lib.rs

The new crate copy-pastes the following from office2pdf:

  • OLE2_MAGIC constant + is_ole2() function (lib_pipeline.rs)
  • extract_panic_message() function (lib.rs)
  • dedup_warnings() function (lib.rs)
  • parser_for_format() — reimplements the format→parser dispatch (lib_pipeline.rs)
  • parse_bytes() — reimplements the full parse pipeline with panic catch (lib_pipeline.rs)
  • parse_path() — reimplements file-based parsing (lib.rs `convert`/`convert_with_options`)

This is ~100 lines of duplicated logic that will diverge from the original as `office2pdf` evolves. Every bug fix or improvement to the parse pipeline will need to be manually applied to both crates.

The correct approach: Add a public `parse_bytes()` function to `office2pdf` itself (or make the internal pipeline functions public), then have `office2typst` call it. The functionality already exists — it just needs to be exposed, not duplicated.

2. Missing font substitution — silently different output

File: `crates/office2typst/src/lib.rs`, `convert_bytes()` function

`office2pdf`'s pipeline includes a critical step between parse and codegen:

// In office2pdf lib_pipeline.rs:
let font_context = resolve_font_search_context(&options.font_paths);
let fallbacks = detect_missing_font_fallbacks(&doc, &options.font_paths);
// warnings are appended for missing fonts
// font_context is passed to typst_gen

`office2typst` skips this entirely and calls `generate_typst_with_options()` without a font context. This means:

  • No font substitution warnings
  • Typst source may reference fonts that don't exist on the target system
  • The generated Typst code will produce different output than `office2pdf` for the same input

This is not just a missing feature — it's a correctness issue. Users would reasonably expect that `office2typst` and `office2pdf` produce consistent intermediate Typst markup for the same input.


HIGH

3. No discussion or issue before adding a new workspace crate

Adding a new crate to the workspace is a significant architectural decision. It affects:

  • CI build matrix (new crate must be tested)
  • crates.io publishing workflow (new package to version and publish)
  • Dependency graph (circular risk if office2pdf ever needs office2typst)
  • API surface maintenance burden

There's no linked issue discussing whether this crate should exist, what its API contract is, or how it relates to the existing `office2pdf` public API. The `office2pdf` lib already exports `pub mod parser` and `pub mod render` — users can already do:

use office2pdf::parser::docx::DocxParser;
use office2pdf::parser::Parser;
use office2pdf::render::typst_gen;

let parser = DocxParser;
let (doc, warnings) = parser.parse(&bytes, &options)?;
let typst_output = typst_gen::generate_typst(&doc)?;

What does `office2typst` provide that the existing public API doesn't?

4. README has duplicate "### CLI" section

File: `README.md`

The new "OOXML to Typst" section is inserted between the existing library usage and CLI sections, creating a duplicate `### CLI` heading in the document:

### OOXML to Typst (no PDF compile)
...

### CLI          ← first occurrence (existing)

### CLI          ← second occurrence (existing, now duplicated context)

This makes the README structure broken.

5. README uses `path` dependency — unusable for external users

File: `README.md`

[dependencies]
office2typst = { path = "crates/office2typst" }

This only works inside the workspace. External users installing from crates.io or git would need a version number or git URL. The README should show the crates.io usage pattern (`office2typst = "0.1"`), or note that this is a workspace-internal crate.

6. `dev-dependencies` pulls in `docx-rs` directly

File: `crates/office2typst/Cargo.toml`

[dev-dependencies]
docx-rs = "0.4"

The workspace uses a patched fork of `docx-rs` (see root `Cargo.toml` `[patch.crates-io]`). This dev-dependency resolves to the patched version due to the workspace patch, but it's confusing and fragile — if the patch is ever removed, this crate would silently use a different `docx-rs` version than `office2pdf`.


MEDIUM

7. No `#[cfg(test)]` path module pattern — inconsistent with project conventions

The project convention (visible in all other crates) is to use separate `_tests.rs` files:

#[cfg(test)]
#[path = "foo_tests.rs"]
mod foo_tests;

This PR puts tests inline in `lib.rs` with `#[cfg(test)] mod tests { ... }`. While functional, it's inconsistent with the codebase.

8. Commit message is vague

The commit message is just "export office2typst lib" — no explanation of why, what problem it solves, or design rationale. Per project conventions, commit messages should explain the "why", not just the "what".


Summary

# Severity Issue
1 CRITICAL ~100 lines of duplicated pipeline logic from office2pdf
2 CRITICAL Missing font substitution — silently different Typst output
3 HIGH New workspace crate without prior design discussion
4 HIGH README has duplicate "### CLI" section
5 HIGH README shows `path` dep — unusable for external users
6 HIGH `docx-rs` dev-dep bypasses workspace patch awareness
7 MEDIUM Test placement inconsistent with project conventions
8 MEDIUM Vague commit message

Recommendation

Before proceeding with a new crate, I'd suggest:

  1. Open an issue first to discuss whether `office2typst` should be a separate crate or if exposing `parse_bytes()` / `generate_typst()` as public functions on `office2pdf` is sufficient.
  2. If a separate crate is justified, it should be a thin wrapper that calls `office2pdf` public APIs — zero duplicated logic.
  3. Font substitution must be included in the pipeline, or the API must clearly document that it's absent and why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants