English | 简体中文 | 繁體中文 | Deutsch | Français | Русский | Português | 日本語 | 한국어 | Español | Tiếng Việt
A document and chart format conversion tool supporting Word/Markdown/Excel bidirectional conversion. Runs completely locally, ensuring data security and reliability.
This software was originally designed for the daily work of the printing office to solve the following problems:
- Document formats sent by various departments are chaotic and need to be organized into standardized formats.
- There are many types of documents, each with different fixed format requirements.
- Needs to run offline, adapting to intranet environments and legacy equipment.
Design Philosophy: This software is positioned as a lightweight, fool-proof tool. While it cannot compare with professional tools like LaTeX or Pandoc in terms of professionalism and functional completeness, it excels in zero learning cost and out-of-the-box usability, making it suitable for daily office scenarios where format requirements are not extremely strict.
- 📄 Document Format Conversion - Bidirectional Word ↔ Markdown conversion. Supports mathematical formula conversion, and bidirectional separator conversion (Markdown's three types of separators vs. Word's page breaks, section breaks, and horizontal lines). Supports formats like DOCX/DOC/WPS/RTF/ODT.
- 📊 Spreadsheet Format Conversion - Bidirectional Excel ↔ Markdown conversion. Supports XLSX/XLS/ET/ODS/CSV formats. Includes table summary tools.
- 📑 PDF and Layout Files - PDF/XPS/OFD to Markdown or DOCX conversion. Supports PDF merging, splitting, and other operations.
- 🖼️ Image Processing - Supports bidirectional conversion and compression of JPEG/PNG/GIF/BMP/TIFF/WebP/HEIC formats.
- 🔍 OCR Text Recognition - Integrated RapidOCR to extract text from images and PDFs.
- ✏️ Text Proofreading - Checks for typos, punctuation, symbols, and sensitive words based on custom dictionaries. Rules can be edited in the settings interface.
- 📝 Template System - Flexible template mechanism supporting custom document and report formats.
- 💻 Dual Mode Operation - Graphical User Interface (GUI) + Command Line Interface (CLI).
- 🔒 Completely Local Operation - Runs offline, ensuring data security with built-in network isolation mechanisms.
- 🔗 Single Instance Operation - Automatically manages program instances and supports integration with the accompanying Obsidian plugin.
- Full internationalization support (GUI and CLI support 11 languages).
- Replaced PaddleOCR with RapidOCR for better compatibility.
- Added multilingual Word/Excel templates.
- Automatic template style detection and injection.
- Other optimizations and fixes.
- Added bidirectional math formula conversion (Word OMML ↔ Markdown LaTeX).
- Added bidirectional footnote/endnote conversion.
- Added character and paragraph styles for code, quotes, etc.
- Enhanced list processing (multi-level nesting, automatic numbering).
- Enhanced table functions (style detection/injection, three-line tables, etc.).
- Optimized cleaning and adding of subheading numbers.
- Improved interface interaction and settings linkage.
- Refactored CLI to improve user experience.
- Added support for more document types.
- Implemented more configurable options.
Double-click DocWen.exe to start the graphical interface.
-
Prepare a Markdown File:
--- title: Test Document --- ## Test Title This is the test body content.
-
Drag and Drop Conversion:
- Launch the program.
- Drag the
.mdfile into the window. - Select a template.
- Click "Convert to DOCX".
-
Get Results:
- A standardized Word document will be generated in the same directory.
Tip: You can use the sample files in the samples/ directory to quickly try out the software's features.
To make it easier for colleagues without background knowledge to remember, the Markdown headings in this software correspond one-to-one with Word headings:
- Document title and subtitle are placed in YAML metadata.
- Markdown
# Heading 1corresponds to Word "Heading 1". - Markdown
## Heading 2corresponds to Word "Heading 2". - And so on, supporting up to 9 levels of headings.
Tip: If you prefer using Markdown's first-level heading (#) as the document title, starting from second-level headings (##) for body headings, you can style "Heading 1" in the Word template to look like a document title (e.g., centered, bold, larger font size), and select a numbering scheme that skips first-level heading numbering in the settings. This way, your first-level headings will appear as document titles.
Basic Rule: Every non-empty line is treated as a separate paragraph by default.
Mixed Paragraphs: When a subheading needs to be mixed with the body text in the same paragraph, the following conditions must be met:
- The subheading ends with a terminating punctuation mark (supports multilingual punctuation, including periods, question marks, exclamation marks, and other common terminating punctuation).
- The body text is located on the immediate next line of the subheading.
- The body text line cannot be a special Markdown element (such as headings, code blocks, tables, lists, quotes, formula blocks, separators, etc.).
Example:
## I. Work Requirements.
This meeting requires all units to earnestly implement...The above two lines will be merged into the same paragraph, where "I. Work Requirements." keeps the subheading format, and "This meeting..." keeps the body text format.
Note:
- There cannot be an empty line between the subheading and the body text; otherwise, they will be recognized as separate paragraphs.
- If the subheading does not end with a punctuation mark and has no empty line before the body text, the body text will be merged into the heading line with adjusted formatting.
Supports bidirectional conversion between Markdown separators and Word page breaks/section breaks/horizontal lines:
- DOCX → MD: Word page breaks, section breaks, and horizontal lines are automatically converted to Markdown separators.
- MD → DOCX: Markdown
---,***,___are automatically converted to corresponding Word elements. - Configurable: Specific mapping relationships can be customized in the settings interface.
- Drag the
.docxfile into the program window. - The program automatically analyzes the document structure.
- Generates a
.mdfile containing YAML metadata.
Supported Formats:
.docx- Standard Word document..doc- Automatically converted to DOCX for processing..wps- WPS document automatically converted.
Export Options:
| Option | Description |
|---|---|
| Extract Images | If checked, images in the document are extracted to the output folder, and image links are inserted into the MD file. |
| Image OCR | If checked, performs OCR on images and creates an image .md file (containing recognized text). |
| Clean Subheading Numbers | If checked, removes numbers before subheadings (e.g., "一、", "(一)", "1.", etc.) and converts them to pure title text. |
| Add Subheading Numbers | If checked, automatically adds numbers based on heading levels (numbering scheme can be configured in settings). |
- Prepare a
.mdfile with a YAML header. - Drag it into the program window and select the corresponding Word template.
- The program automatically fills the template and generates the document.
Conversion Options:
| Option | Description |
|---|---|
| Clean Subheading Numbers | If checked, removes numbers before subheadings. |
| Add Subheading Numbers | If checked, automatically adds numbers based on heading levels. |
Note: If there are paragraphs where subheadings and body text are mixed, strict line breaks must be maintained in the MD file (see "Line Breaks and Paragraphs" above).
The converter automatically detects and processes template styles during Markdown → DOCX conversion:
Paragraph Style: Applied to the entire paragraph.
| Style | Detection Behavior | Injection when Missing | Source |
|---|---|---|---|
| Heading (1~9) | Detects paragraph style | Template heading styles | Word Built-in |
| Code Block | Detects paragraph style | Consolas font + Gray background | Defined by Software |
| Quote (1~9) | Detects paragraph style | Gray background + Left border | Defined by Software |
| Formula Block | Detects paragraph style | Formula specific style | Defined by Software |
| Separator (1~3) | Detects paragraph style | Bottom border paragraph style | Defined by Software |
Character Style: Applied to selected text.
| Style | Detection Behavior | Injection when Missing | Source |
|---|---|---|---|
| Inline Code | Detects character style | Consolas font + Gray shading | Defined by Software |
| Inline Formula | Detects character style | Formula specific style | Defined by Software |
Table Style: Applied to the entire table.
| Style | Detection Behavior | Injection when Missing | Source |
|---|---|---|---|
| Three-Line Table | User config priority | Three-line table style definition | Defined by Software |
| Grid Table | User config priority | Grid table style definition | Defined by Software |
Numbering Definition: Used for list formats.
| Type | Detection Behavior | Handling when Missing |
|---|---|---|
| List Numbering | Scans existing ordered/unordered list definitions in template | Uses decimal/bullet preset |
- Word Built-in Styles (heading 1~9):
- Style names use Word standard English names (e.g.,
heading 1). - Word automatically displays localized names based on system language (e.g., "标题 1" on Chinese systems).
- Style names use Word standard English names (e.g.,
- Software Defined Styles (Code Block, Quote, Formula, Separator, Table, etc.):
- Injects corresponding language style names based on the software's interface language setting.
- Chinese Interface: Injects "代码块", "引用 1", "三线表", etc.
- English Interface: Injects "Code Block", "Quote 1", "Three Line Table", etc.
Suggestion: After customizing styles in the template, the converter will automatically use your styles; if not present in the template, it will use built-in preset styles.
- Excel/CSV to Markdown: Drag
.xlsxor.csvfiles to automatically convert to Markdown tables. - Markdown to Excel: Prepare an MD file and select an Excel template for conversion.
Supported Formats:
.xlsx- Standard Excel document..xls- Automatically converted to XLSX for processing..et- WPS spreadsheet automatically converted..csv- CSV text table.
The program provides four customizable proofreading rules:
- Punctuation Pairing Check - Detects if paired punctuation like parentheses and quotes match.
- Symbol Proofreading - Detects mixed use of Chinese and English punctuation.
- Typo Check - Checks for common typos based on a custom dictionary.
- Sensitive Word Detection - Detects sensitive words based on a custom dictionary.
Custom Dictionaries: Visually edit typo and sensitive word dictionaries in the "Settings" interface.
Usage:
- Drag the Word document to be proofread into the program.
- Check the required proofreading rules.
- Click the "Text Proofreading" button.
- Proofreading results are displayed as comments in the document.
The program comes with various templates, including multilingual versions. You can select and use them as needed. Template files are located in the templates/ directory.
- Create a template file using Word or WPS.
- Refer to existing templates and insert placeholders like
{{Title}},{{DocumentNumber}}, etc., where filling is needed. - In the template, built-in Heading 1 ~ Heading 5 styles need to be manually modified.
- Save the template to the
templates/directory. - Restart the program, and the new template will be automatically loaded.
You can also copy an existing template, modify it, and rename it.
YAML Field Placeholders: Use {{Field Name}} format in the template, which will be replaced by the corresponding value in the Markdown file's YAML header during conversion.
| Placeholder | Description |
|---|---|
{{Title}} |
Document title (Retrieval rules see below) |
{{Body}} |
Markdown body content insertion position |
| Others | Supports any custom field |
Title Retrieval Priority:
| Priority | Source | Description |
|---|---|---|
| 1 | YAML Title field |
Highest priority |
| 2 | YAML aliases field |
Takes the first element of the list, or string value |
| 3 | Filename | Filename without .md extension |
Multilingual Support: The title and body placeholders support multiple languages, e.g., title can be {{title}}, {{标题}}, {{Titel}}, etc., body can be {{body}}, {{正文}}, {{Inhalt}}, etc.
Excel templates support three types of placeholders:
1. YAML Field Placeholder {{Field Name}}
Used to fill a single value from the Markdown file's YAML header:
---
ReportName: 2024 Annual Sales Statistics
Unit: Sales Dept
---{{ReportName}}, {{Unit}} in the template will be replaced with corresponding values. The title field also follows the priority rules.
2. Column Fill Placeholder {{↓Field Name}}
Extracts data from the Markdown table and fills downwards row by row starting from the placeholder position:
| ProductName | Quantity |
|:--- |:--- |
| Product A | 100 |
| Product B | 200 |{{↓ProductName}} in the Excel template will be replaced by "Product A", and the next row will be filled with "Product B".
3. Row Fill Placeholder {{→Field Name}}
Extracts data from the Markdown table and fills rightwards column by column starting from the placeholder position:
| Month |
|:--- |
| Jan |
| Feb |
| Mar |{{→Month}} in the Excel template will be filled with "Jan", "Feb", "Mar" sequentially to the right.
Merged Cell Handling: The program automatically skips non-first cells of merged cells to ensure correct data filling.
Multi-table Data Merge: If there are multiple tables in Markdown using the same header name, data will be merged in order and filled sequentially.
Most users use this software through the graphical interface. Here is the detailed operation guide.
The program uses an adaptive three-column layout:
| Area | Description | Display Timing |
|---|---|---|
| Center Column (Main Area) | File drag-and-drop area, operation panel, status bar | Always shown |
| Right Column | Template selector / Format conversion panel | Automatically expands after selecting a file |
| Left Column | Batch file list (grouped by type) | Shown when switching to batch mode |
- Launch Program: Double-click
DocWen.exe. - Import File:
- Method 1: Drag and drop files directly into the window.
- Method 2: Click the "Add" button in the drag-and-drop area to select files.
- Select Template (if conversion is needed): The right template panel expands automatically; select a suitable template.
- Configure Options: Check the required conversion/export options in the operation panel.
- Execute Operation: Click the corresponding function button (e.g., "Export MD", "Convert to DOCX", etc.).
- View Result: The status bar shows progress and results; click the 📍 icon to locate the output file.
The program supports two processing modes, switchable via the toggle button in the file drag-and-drop area:
Single File Mode (Default):
- Process one file at a time.
- Simple interface, suitable for daily use.
Batch Mode:
- Import multiple files simultaneously.
- Left column shows categorized file list (grouped by document/spreadsheet/image, etc.).
- Supports batch adding, removing, and sorting.
- Clicking a file in the list switches the current operation target.
The operation panel automatically adjusts available options based on file type:
| File Type | Available Operations |
|---|---|
| Word Document | Export MD, Convert PDF, Text Proofreading, OCR |
| Markdown | Convert DOCX, Convert PDF |
| Excel Spreadsheet | Export MD, Convert PDF, Table Summary |
| PDF File | Export MD, Merge, Split, OCR |
| Image File | Format Conversion, Compression, OCR |
Click the ⚙️ button in the bottom right corner of the window to open settings:
- General: Interface theme, language, window opacity.
- Conversion: Default values for various conversion options.
- Output: Default output directory, file naming rules.
- Proofread: Edit typo and sensitive word dictionaries.
- Style: Code block, quote, table style configurations.
- Drag External File: Drag directly into the window to import.
- Double-click Status Bar Result: Quickly open the output file directory.
- Right-click Template Item: Open template file location.
In addition to the GUI, the program provides a Command Line Interface (CLI), suitable for automation scripts and batch processing scenarios.
- Interactive Mode: Displays a menu guide after passing in a file, similar to GUI operation.
- Headless Mode: Execute directly by adding
--actionparameter, suitable for script invocation.
# Interactive Mode
DocWen.exe document.docx
# Export Word to Markdown (Extract Images + OCR)
DocWen.exe report.docx --action export_md --extract-img --ocr
# Markdown to Word (Specify Template)
DocWen.exe document.md --action convert --target docx --template "Template Name"
# Batch Conversion (Skip confirmation, continue on error)
DocWen.exe *.docx --action export_md --batch --yes --continue-on-error
# Document Proofreading
DocWen.exe document.docx --action validate --check-typo --check-punct
# PDF Merge/Split
DocWen.exe *.pdf --action merge_pdfs
DocWen.exe report.pdf --action split_pdf --pages "1-3,5,7-10"| Argument | Description |
|---|---|
--action |
Operation type: export_md, convert, validate, merge_pdfs, split_pdf |
--target |
Target format: pdf, docx, xlsx, md |
--template |
Template name (e.g., Template Name) |
--extract-img |
Extract images during export |
--ocr |
Enable OCR recognition |
--batch |
Batch processing mode |
--yes / -y |
Skip confirmation prompts |
--continue-on-error |
Continue processing next item on error |
--json |
Output result in JSON format |
--quiet / -q |
Quiet mode, reduce output |
The project includes a matching Obsidian plugin to work in tandem with the converter:
- 🚀 One-Click Launch - Sidebar icon to quickly launch the converter.
- 📂 Automatic Handover - Automatically passes the currently open file path.
- 🔄 Single Instance Management - Automatically sends file if the program is already running, no need to restart.
- 💪 Crash Recovery - Automatically detects process status and automatically cleans up residual files.
The plugin interacts with the converter via file system-based IPC:
- First Click → Launch converter and pass current file.
- Click Again (With File) → Replace with new file (Single File Mode).
- Click Again (No File) → Activate converter window.
The plugin has been released to a separate repository. Please visit docwen-obsidian for installation instructions and the latest version.
- Check if the file is occupied by another program.
- Confirm the file format is correct.
- Check error logs in the
logs/directory.
- Confirm template files are in the
templates/directory. - Check if the template file is corrupted.
- Restart the program to reload templates.
- Confirm the document is in .docx format.
- Check if the document contains editable text.
- Confirm proofreading rules are enabled in settings.
- The program generates documents based on template styles. To adjust output format, modify the style definitions in the template file directly.
- Template files are located in the
templates/directory. - After modifying template styles, all documents converted using that template will apply the new styles.
This is expected behavior. The program reads the cached values of cells rather than the formulas themselves.
Technical Reason:
- In Excel files, formula cells store both the formula and the last calculated result (cached value).
- The program uses
data_only=Truemode, which only retrieves cached values. - If the file has never been opened in Excel (e.g., generated by a program), or was edited but not re-saved, the cached value will be empty.
Solution:
- Open the file in Excel.
- Wait for formula calculation to complete.
- Save the file.
- Convert again.
- Completely Local Operation: All processing is done locally, no network dependency.
- Network Isolation: Built-in network isolation mechanism prevents data leakage.
- No Data Upload: User files are never uploaded to any server.
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0).
- This project uses PyMuPDF (licensed under AGPL-3.0), so the entire project is also licensed under AGPL-3.0.
- You are free to use, modify, and distribute this software.
- If you modify this software and provide services over a network, you must provide the modified source code to users.
- For detailed license information, please see the LICENSE file.
- GitHub: https://github.com/ZHYX91/docwen
- Contact Author: zhengyx91@hotmail.com
Author: ZhengYX