Skip to content

refactor: extensibility-focused refactoring roadmap #138

@developer0hye

Description

@developer0hye

Overview

A thorough analysis of the entire codebase (~49,000 lines across 72+ source files) identified files and patterns that hinder extensibility and maintainability. This issue serves as the consolidated refactoring roadmap.


Critical — Must Refactor

1. render/typst_gen.rs (1,307 lines) — Monolithic god module

  • Image preprocessing, page layout, shape rendering, gradient handling, header/footer generation, tab stop processing — all in one file.
  • generate_fixed_page() has 3 near-identical #set page(...) branches.
  • GenCtx passed as &mut through ~15 function signatures.
  • Action: Split into typst_gen_images.rs, typst_gen_shapes.rs, typst_gen_headers.rs.

2. parser/omml.rs (1,209 lines) — Giant recursive parser

  • parse_omml_children() is 380 lines with a match on 20+ element types.
  • unicode_to_typst() has 100+ match arms.
  • Action: Split by math element category (fractions, radicals, matrices, accents, etc.).

3. parser/pptx_text.rs (873 lines) — Oversized list-style parser

  • parse_pptx_list_style() is 315 lines with 6 inline helper functions and 5 state variables.
  • 3 near-identical merge functions (merge_paragraph_style, merge_text_style, merge_pptx_bullet_definition) totaling 69 lines of boilerplate.
  • Action: Extract state struct, per-element parsers, and Mergeable trait/macro.

4. parser/pptx_slides.rs (996 lines) — Complex inheritance chain

  • parse_single_slide() ~140 lines resolving master→layout→slide with 3 repetitive resolve_effective_color_map() calls.
  • Action: Simplify inheritance resolution; extract helper functions.

5. render/typst_gen_lists.rs (773 lines) — Duplicated code paths

  • Two parallel rendering paths (flex-width vs fixed-width) duplicate the same item loop logic.
  • 5 near-identical paragraph attribute comparison helpers.
  • Action: Extract ListItemRenderer struct; deduplicate paths.

6. render/typst_gen_text.rs (670 lines) — Deep nesting

  • write_run_segment() is 89 lines with 5+ nesting levels and 6 boolean flags for bracket order.
  • Tab stop processing at 4 nesting levels.
  • Action: Extract TextWrapper struct; break tab logic into smaller functions.

High — Should Refactor

7. parser/docx.rs (569 lines)

  • parse() is 183 lines; convert_paragraph_blocks() is 165 lines with 4 nesting levels.
  • Text extraction logic duplicated 3 times (lines ~479-527).
  • Action: Factor convert_paragraph_blocks() into smaller functions; deduplicate text extraction.

8. parser/cond_fmt.rs (411 lines)

  • build_cond_fmt_overrides() is 244 lines with 3+ nesting levels.
  • No polymorphism for rule types (CellIs, ColorScale, DataBar, IconSet all inline).
  • Action: Extract ConditionalRule trait with evaluate() / apply_style().

9. lib_pipeline.rs (381 lines)

  • Excessive #[cfg] duplication between native and WASM targets (4 distinct code paths for nearly identical logic).
  • Unsafe .unwrap() at line 328 on all_pdfs.into_iter().next().
  • dedup_warnings() called manually in 4 places.
  • Action: Extract platform trait; replace .unwrap() with .expect() or Result; centralize warning dedup.

10. render/font_subst.rs (410 lines)

  • Thread-local RefCell<Option<FontSearchContext>> — mutable global state.
  • 4 duplicate recursive tree-walking functions: collect_paragraph_fonts(), collect_table_fonts(), collect_header_footer_fonts(), collect_block_fonts().
  • Action: Replace thread-local with parameter passing; introduce visitor pattern for tree walking.

11. render/pdf.rs (376 lines)

  • compile_to_pdf() and compile_to_pdf_wasm() are nearly identical.
  • Duplicate image map construction in MinimalWorld::new() variants.
  • Action: Unify into single function with #[cfg] branch at world creation only.

12. render/typst_gen_tables.rs (318 lines)

  • generate_table_rows() 56 lines with 3-level nested rowspan tracking.
  • Cell content generation duplicates generate_block() match logic.
  • Action: Extract RowspanTracker struct; reuse generate_block().

13. pdf_ops.rs (250 lines)

  • Unsafe .unwrap() on as_dict_mut() (line 145).
  • merge() and split() share duplicate page validation.
  • Action: Fix panics; extract common PDF document handling.

Medium — Worth Considering

File Lines Issue
parser/pptx_theme.rs 717 Color parsing could share code with docx_text.rs
parser/docx_sections.rs 456 build_flow_page_from_section() exceeds 50 lines
parser/docx_tables.rs 460 Silent truncation at MAX_TABLE_DEPTH = 64 (no warning logged)
parser/chart.rs 447 8 chart types matched with near-identical parse calls; should loop
parser/xlsx.rs 262 parse() and parse_streaming() duplicate shared structure
cli/server.rs 306 Manual multipart boundary parsing; fragile string slicing
cli/metrics.rs 285 .lock().unwrap() appears 12+ times

Cross-Cutting Duplication Patterns

Pattern Locations Proposed Fix
Style merging boilerplate docx_text.rs, docx_styles.rs, pptx_text.rs (3 near-identical merge functions) Mergeable trait or macro
Unit conversions 12700 (EMU→pt) in 3 files, 20.0 (twips→pt) in 4 files Shared constants EMU_PER_PT, TWIPS_PER_PT
Color parsing docx_text.rs, docx_tables.rs, cond_fmt.rs, pptx_theme.rs Consolidate into xml_util.rs
Font tree walking font_subst.rs — 4 functions with identical recursive pattern Visitor pattern or generic walker
#[cfg] branching lib_pipeline.rs, pdf.rs Platform trait abstraction

Files That Need No Refactoring

  • config.rs, error.rs — clean and well-structured
  • ir/ module — clean intermediate representation
  • parser/mod.rs, parser/xml_util.rs, parser/metadata.rs — simple and focused
  • parser/docx_contexts.rs + 7 context submodules — good pattern-reduction
  • All *_tests.rs files — comprehensive and well-named

Recommended Priority Order

  1. Split typst_gen.rs into sub-modules (highest impact)
  2. Extract shared unit conversion constants (quick win)
  3. Create Mergeable trait/macro for style merging (~100 lines saved)
  4. Split parse_omml_children() by math element category
  5. Extract parse_pptx_list_style() into state struct + per-element parsers
  6. Consolidate font tree walkers with visitor pattern
  7. Unify #[cfg] paths in lib_pipeline.rs and pdf.rs
  8. Fix unsafe .unwrap() calls in lib_pipeline.rs and pdf_ops.rs

Metadata

Metadata

Assignees

No one assigned

    Labels

    refactoringCode refactoring and cleanup

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions