Feat: export office2typst lib#134
Conversation
Signed-off-by: hezz <hezz@example.com>
developer0hye
left a comment
There was a problem hiding this comment.
Code Review — PR #134
Thanks for the contribution. The idea of exposing OOXML→IR→Typst without forcing PDF compilation is interesting, but this PR has significant architectural and quality issues that need to be addressed.
CRITICAL
1. Massive code duplication from office2pdf::lib.rs
File: crates/office2typst/src/lib.rs
The new crate copy-pastes the following from office2pdf:
OLE2_MAGICconstant +is_ole2()function (lib_pipeline.rs)extract_panic_message()function (lib.rs)dedup_warnings()function (lib.rs)parser_for_format()— reimplements the format→parser dispatch (lib_pipeline.rs)parse_bytes()— reimplements the full parse pipeline with panic catch (lib_pipeline.rs)parse_path()— reimplements file-based parsing (lib.rs `convert`/`convert_with_options`)
This is ~100 lines of duplicated logic that will diverge from the original as `office2pdf` evolves. Every bug fix or improvement to the parse pipeline will need to be manually applied to both crates.
The correct approach: Add a public `parse_bytes()` function to `office2pdf` itself (or make the internal pipeline functions public), then have `office2typst` call it. The functionality already exists — it just needs to be exposed, not duplicated.
2. Missing font substitution — silently different output
File: `crates/office2typst/src/lib.rs`, `convert_bytes()` function
`office2pdf`'s pipeline includes a critical step between parse and codegen:
// In office2pdf lib_pipeline.rs:
let font_context = resolve_font_search_context(&options.font_paths);
let fallbacks = detect_missing_font_fallbacks(&doc, &options.font_paths);
// warnings are appended for missing fonts
// font_context is passed to typst_gen`office2typst` skips this entirely and calls `generate_typst_with_options()` without a font context. This means:
- No font substitution warnings
- Typst source may reference fonts that don't exist on the target system
- The generated Typst code will produce different output than `office2pdf` for the same input
This is not just a missing feature — it's a correctness issue. Users would reasonably expect that `office2typst` and `office2pdf` produce consistent intermediate Typst markup for the same input.
HIGH
3. No discussion or issue before adding a new workspace crate
Adding a new crate to the workspace is a significant architectural decision. It affects:
- CI build matrix (new crate must be tested)
- crates.io publishing workflow (new package to version and publish)
- Dependency graph (circular risk if office2pdf ever needs office2typst)
- API surface maintenance burden
There's no linked issue discussing whether this crate should exist, what its API contract is, or how it relates to the existing `office2pdf` public API. The `office2pdf` lib already exports `pub mod parser` and `pub mod render` — users can already do:
use office2pdf::parser::docx::DocxParser;
use office2pdf::parser::Parser;
use office2pdf::render::typst_gen;
let parser = DocxParser;
let (doc, warnings) = parser.parse(&bytes, &options)?;
let typst_output = typst_gen::generate_typst(&doc)?;What does `office2typst` provide that the existing public API doesn't?
4. README has duplicate "### CLI" section
File: `README.md`
The new "OOXML to Typst" section is inserted between the existing library usage and CLI sections, creating a duplicate `### CLI` heading in the document:
### OOXML to Typst (no PDF compile)
...
### CLI ← first occurrence (existing)
### CLI ← second occurrence (existing, now duplicated context)This makes the README structure broken.
5. README uses `path` dependency — unusable for external users
File: `README.md`
[dependencies]
office2typst = { path = "crates/office2typst" }This only works inside the workspace. External users installing from crates.io or git would need a version number or git URL. The README should show the crates.io usage pattern (`office2typst = "0.1"`), or note that this is a workspace-internal crate.
6. `dev-dependencies` pulls in `docx-rs` directly
File: `crates/office2typst/Cargo.toml`
[dev-dependencies]
docx-rs = "0.4"The workspace uses a patched fork of `docx-rs` (see root `Cargo.toml` `[patch.crates-io]`). This dev-dependency resolves to the patched version due to the workspace patch, but it's confusing and fragile — if the patch is ever removed, this crate would silently use a different `docx-rs` version than `office2pdf`.
MEDIUM
7. No `#[cfg(test)]` path module pattern — inconsistent with project conventions
The project convention (visible in all other crates) is to use separate `_tests.rs` files:
#[cfg(test)]
#[path = "foo_tests.rs"]
mod foo_tests;This PR puts tests inline in `lib.rs` with `#[cfg(test)] mod tests { ... }`. While functional, it's inconsistent with the codebase.
8. Commit message is vague
The commit message is just "export office2typst lib" — no explanation of why, what problem it solves, or design rationale. Per project conventions, commit messages should explain the "why", not just the "what".
Summary
| # | Severity | Issue |
|---|---|---|
| 1 | CRITICAL | ~100 lines of duplicated pipeline logic from office2pdf |
| 2 | CRITICAL | Missing font substitution — silently different Typst output |
| 3 | HIGH | New workspace crate without prior design discussion |
| 4 | HIGH | README has duplicate "### CLI" section |
| 5 | HIGH | README shows `path` dep — unusable for external users |
| 6 | HIGH | `docx-rs` dev-dep bypasses workspace patch awareness |
| 7 | MEDIUM | Test placement inconsistent with project conventions |
| 8 | MEDIUM | Vague commit message |
Recommendation
Before proceeding with a new crate, I'd suggest:
- Open an issue first to discuss whether `office2typst` should be a separate crate or if exposing `parse_bytes()` / `generate_typst()` as public functions on `office2pdf` is sufficient.
- If a separate crate is justified, it should be a thin wrapper that calls `office2pdf` public APIs — zero duplicated logic.
- Font substitution must be included in the pipeline, or the API must clearly document that it's absent and why.
No description provided.