A macOS command-line tool that converts Word documents (.docx) to Pages documents (.pages) using a template's native styles—without style pollution.
ℹ️ macOS 26+ Users: This tool automatically uses an HTML-based pipeline on macOS 26+ to work around broken JXA APIs. Styled output works, but template styles are not applied—semantic HTML styling is used instead. See macOS 26+ Compatibility for details.
When you open a .docx file directly in Pages, it imports the Word styles, creating duplicates and "imported" variants that pollute your style list. This tool avoids that by:
- Parsing the DOCX file to extract pure structure (headings, paragraphs, lists, tables)
- Copying your template at the filesystem level (template is never opened)
- Applying only the template's existing styles to the content
The result: a clean Pages document that looks like it was created natively in Pages.
- All heading levels (1-9) with style saturation for templates with fewer levels
- Title and Subtitle styles
- Bulleted lists (native styles when available)
- Numbered lists with nesting (native styles when available)
- Tables with cell content preservation (supports >26 columns)
- Tabs and soft line breaks preserved in text
- Document order preservation
- Custom styles derived from headings (follows style inheritance chain)
- No Word style pollution in output
- Template safety (filesystem copy, never opened by Pages)
- Strict mode for CI/CD validation
- Concurrency lock prevents parallel runs
- Machine-readable JSON summary output
- Inline formatting (bold, italic, underline) — dropped silently
- Footnotes and endnotes — dropped silently
- Text boxes and shapes — dropped silently
- Comments and tracked changes — dropped silently
- Images and media — dropped silently
- Section-level layout reconstruction — dropped silently
- macOS 12.0 or later (macOS 26+ has known JXA issues)
- Pages.app installed
- Python 3.x with
defusedxmlpackage (for secure DOCX parsing) - Swift 5.9+ toolchain (for building from source only)
Install the required Python packages:
pip3 install -r requirements.txt
# Or manually: pip3 install defusedxmlThe defusedxml package provides protection against XML External Entity (XXE) attacks when parsing DOCX files.
Download the latest release from GitHub Releases:
# Download and unzip
curl -L https://github.com/rogu3bear/docx2pages/releases/latest/download/docx2pages-X.Y.Z-macos.zip -o docx2pages.zip
unzip docx2pages.zip
# Run directly (no Swift toolchain needed)
./docx2pages-X.Y.Z-macos/docx2pages -i doc.docx -o out.pages -t template.pagesThe release package includes the binary and required scripts. No Swift toolchain needed.
# Add the tap and install
brew tap rogu3bear/docx2pages https://github.com/rogu3bear/docx2pages
brew install docx2pages
# Install required Python dependency
pip3 install defusedxml# Clone the repository
git clone https://github.com/rogu3bear/docx2pages.git
cd docx2pages
# Build the CLI
swift build -c release
# The binary is at .build/release/docx2pages
# Optionally copy to your PATH:
cp .build/release/docx2pages /usr/local/bin/On first run, macOS will prompt you to grant automation permissions for Pages. You can also pre-authorize via:
System Settings → Privacy & Security → Automation → Terminal (or your terminal app) → Pages
docx2pages --input <file.docx> --output <file.pages> --template <template.pages>| Option | Short | Description |
|---|---|---|
--input |
-i |
Input DOCX file path (required) |
--output |
-o |
Output .pages file path (required) |
--template |
-t |
Pages template file path (required) |
--strict |
Fail on style pollution or fallback behavior | |
--overwrite |
Overwrite output file if it exists | |
--preserve-breaks |
Convert page/section breaks to blank paragraphs | |
--prefix-deep-headings |
Prefix headings beyond template max with "HN:" | |
--table-style |
Name of table style to apply from template | |
--no-lock |
Disable concurrency lock (advanced) | |
--no-wait |
Fail immediately if lock is held (don't wait) | |
--batch-size |
Paragraph batch size for Pages writer (default: 50) | |
--json-summary |
Write JSON summary to file (use - for stdout) |
|
--scripts-dir |
Directory containing scripts (for dist package) | |
--force-text |
Skip style API (macOS 26+ compatibility, plain text output) | |
--timeout |
Process timeout in seconds (default: 120, range: 1-3600) | |
--verbose |
-v |
Enable verbose logging |
--help |
-h |
Show help |
--version |
Show version |
# Basic conversion
docx2pages -i report.docx -o report.pages -t ~/Templates/Corporate.pages
# With verbose output
docx2pages -i report.docx -o report.pages -t ~/Templates/Corporate.pages -v
# Strict mode (fails on any style pollution)
docx2pages -i report.docx -o report.pages -t ~/Templates/Corporate.pages --strict
# Preserve page breaks as blank paragraphs
docx2pages -i report.docx -o report.pages -t ~/Templates/Corporate.pages --preserve-breaks
# Prefix deep headings (H7, H8, H9) when template only has Heading 1-6
docx2pages -i thesis.docx -o thesis.pages -t ~/Templates/Academic.pages --prefix-deep-headings
# Output to nested directory (created automatically)
docx2pages -i doc.docx -o /tmp/nested/output/doc.pages -t template.pages
# Generate JSON summary alongside conversion
docx2pages -i doc.docx -o doc.pages -t template.pages --json-summary summary.json
# JSON to stdout (human output to stderr)
docx2pages -i doc.docx -o doc.pages -t template.pages --json-summary - > result.jsonYour Pages template should include these paragraph styles for best results:
| Style Name | Purpose |
|---|---|
Body |
Body text (fallback: "Body Text", "Normal") |
Title |
Document title |
Subtitle |
Document subtitle |
Heading or Heading 1 |
Level 1 headings |
Heading 2 ... Heading 9 |
Additional heading levels |
For native list rendering, include these paragraph styles in your template:
| Style Name | Purpose |
|---|---|
Bullet |
Bulleted list items (alternatives: "Bulleted", "Bulleted List", "Bullets") |
Numbered |
Numbered list items (alternatives: "Numbered List", "Numbers") |
Without list styles: Lists will be rendered as formatted text with • or 1. prefixes and indentation. Visually correct but not "true" Pages list objects.
With list styles: Lists use the native paragraph styles from your template, giving you full control over list appearance.
- Open Pages and create a new blank document
- Go to Format → Paragraph Styles
- Create or modify styles: Body, Title, Heading, Heading 2, etc.
- For lists: Create "Bullet" and "Numbered" paragraph styles
- Save as your template file
Standard Pipeline (macOS 12-15):
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ DOCX File │─────▶│ Python Parser │─────▶│ JSON Blocks │
└─────────────┘ │ (parse_docx) │ │ (intermediate) │
└──────────────┘ └────────┬────────┘
│
┌─────────────┐ ┌──────────────┐ │
│ Template │─────▶│ Filesystem │ │
│ (.pages) │ │ Copy │ │
└─────────────┘ └──────┬───────┘ │
│ │
┌──────▼───────┐ │
│ Pages Writer │◀──────────────┘
│ (JXA) │
└──────┬───────┘
│
┌──────▼───────┐
│ Output File │
│ (.pages) │
└──────────────┘
HTML Pipeline (macOS 26+ automatic fallback):
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ DOCX File │─────▶│ Python Parser │─────▶│ JSON Blocks │
└─────────────┘ │ (parse_docx) │ │ (intermediate) │
└──────────────┘ └────────┬────────┘
│
┌────────▼────────┐
│ HTML Writer │
│ (html_writer) │
└────────┬────────┘
│
┌────────▼────────┐
│ textutil │
│ (HTML → RTF) │
└────────┬────────┘
│
┌────────▼────────┐
│ Pages Opens RTF │
│ & Exports .pages │
└────────┬────────┘
│
┌────────▼────────┐
│ Output File │
│ (.pages) │
└─────────────────┘
Note: The RTF step is required because pages.open() returns null on macOS 26+,
but RTF files opened via shell open -a Pages are correctly tracked in documents.
Key safety feature: The template is copied at the filesystem level before any Pages automation. Pages only opens the copy, never the original template.
The parser emits these block types:
{"type": "title", "text": "Document Title"}
{"type": "subtitle", "text": "Subtitle Text"}
{"type": "heading", "level": 1, "text": "Chapter One"}
{"type": "paragraph", "text": "Body text..."}
{"type": "list", "ordered": false, "items": [{"text": "Item", "level": 0}]}
{"type": "table", "rows": [["Cell 1", "Cell 2"], ["Cell 3", "Cell 4"]]}
{"type": "break"}Optional metadata fields:
Paragraphs with hyperlinks include a links array:
{
"type": "paragraph",
"text": "Visit Example [https://example.com] for info",
"links": [{"text": "Example", "url": "https://example.com", "start": 6, "end": 13}]
}Tables with merged cells include a merges array:
{
"type": "table",
"rows": [["Header", null], [null, "Data"]],
"merges": [{"row": 0, "col": 0, "rowspan": 2, "colspan": 1}]
}Null values in table rows indicate merged cell continuations.
The tool maps Word styles to Pages template styles:
| Word Style | Pages Style (preference order) |
|---|---|
| Normal / Body | Body → Body Text → Normal |
| Title | Title → Heading 1 |
| Subtitle | Subtitle → Body |
| Heading 1 | Heading → Heading 1 |
| Heading N (2-9) | Heading N (saturates at template max) |
| Bulleted list | Bullet → Bulleted → Bulleted List → Bullets (fallback: text) |
| Numbered list | Numbered → Numbered List → Numbers (fallback: text) |
Saturation: If your template only has Heading 1-3 and the document has Heading 5, it maps to Heading 3.
Use --strict to enforce clean conversions:
docx2pages -i doc.docx -o out.pages -t template.pages --strictIn strict mode, the tool will fail with a non-zero exit code if:
- Any new paragraph styles appear in the output that weren't in the template
- Any table falls back to text rendering
- Any list falls back to text rendering (when template lacks list styles)
This is useful for CI/CD pipelines to catch unexpected style pollution.
Without --strict, fallbacks are reported as warnings but the conversion succeeds.
By default, docx2pages acquires an exclusive lock on /tmp/docx2pages.lock to prevent multiple simultaneous conversions. This avoids conflicts when Pages is processing documents.
Use --no-lock to disable this behavior if you need to run multiple conversions in controlled environments.
Use --json-summary <path> to output a machine-readable summary:
docx2pages -i doc.docx -o out.pages -t template.pages --json-summary result.jsonOr use - to write JSON to stdout (human output goes to stderr):
docx2pages -i doc.docx -o out.pages -t template.pages --json-summary - 2>/dev/nullThe JSON includes:
toolVersion: Version stringinput,output,template: File pathsstrict: Whether strict mode was enabledparseStats: Headings, paragraphs, lists, tables countswriteResult: Styles used, pollution detected, warningselapsedSeconds: Conversion timesuccess: Booleanerror: Error message if failed
The fixtures/ directory contains test DOCX files:
| File | Description |
|---|---|
all_headings.docx |
Title, Subtitle, Heading 1-9, custom derived heading |
mixed_lists.docx |
Bulleted and numbered lists with nesting |
tables.docx |
Multiple tables (3x3 and 5x4) |
comprehensive.docx |
All element types combined |
large.docx |
300+ paragraphs for performance testing |
wide_table.docx |
Table with 35 columns (tests >26 column addressing) |
whitespace.docx |
Tabs and soft line breaks |
empty.docx |
Empty document (edge case testing) |
minimal.docx |
Single paragraph document (edge case testing) |
python3 scripts/create_fixtures.pyRun the smoke test to verify all fixtures convert successfully:
# Requires a Pages template file
TEMPLATE=/path/to/template.pages scripts/smoke_test.shThe smoke test:
- Builds the release binary
- Regenerates all fixtures
- Converts each fixture with
--strict --overwrite - Reports pass/fail for each
The repository includes two CI workflows:
ci.yml - Runs on every push and PR:
- Parser tests on Ubuntu and macOS (no Pages required)
- Golden test comparison for parser output stability
- Swift build verification on macOS
- CLI surface contract checks
pages-integration.yml - Manual trigger for full testing:
- Requires a self-hosted macOS runner with Pages installed
- Runs full smoke test with
--strictmode - Uploads test outputs as artifacts
# Parser golden tests (no Pages required, runs on any platform)
python3 scripts/test_parser_golden.py
# Update golden files after intentional parser changes
python3 scripts/test_parser_golden.py --update
# Build and CLI checks
swift build -c release
.build/release/docx2pages --version
.build/release/docx2pages --helpFull integration tests require:
- macOS with Pages.app installed
- Automation permission granted
- A Pages template file
# Run smoke test (requires template)
TEMPLATE=/path/to/template.pages scripts/smoke_test.sh
# Quick mode (skips large fixtures)
TEMPLATE=/path/to/template.pages scripts/smoke_test.sh --quick
# Individual conversion with strict mode
.build/release/docx2pages \
-i fixtures/comprehensive.docx \
-o /tmp/test.pages \
-t /path/to/template.pages \
--strict
echo $? # 0 = successDue to fundamental limitations in Apple's JXA (JavaScript for Automation) bridge for Pages:
-
Hyperlinks: Extracted from DOCX and preserved in JSON metadata (
linksarray), but appear as plain text with[URL]suffix in Pages output. The Pages JXA API does not support creating clickable hyperlinks programmatically. The JSON intermediate format contains full link information for other tooling. -
Merged Cells: Detected and preserved in JSON metadata (
mergesarray withrowspan/colspan), but Pages tables are rendered as regular grids. The Pages JXA API does not support cell merging programmatically. Merged continuation cells appear empty in Pages output.
These are Apple platform limitations, not bugs in docx2pages. The JSON intermediate format is lossless for use with other tools.
-
Images: Not supported. Images in DOCX are silently dropped.
-
Headers/Footers: Not extracted or transferred.
-
Page Breaks: By default, page/section breaks are dropped. Use
--preserve-breaksto convert them to blank paragraphs. -
Inline Formatting: Bold, italic, underline, etc. are not preserved. Only paragraph-level styles are applied.
-
Nested List Levels: Nesting is represented by indentation only. True multi-level list numbering (1.1, 1.2, etc.) depends on template list style configuration.
-
Footnotes/Endnotes: Not supported; silently dropped.
-
Text Boxes/Shapes: Not supported; silently dropped.
-
Comments/Track Changes: Not supported; silently dropped.
| Document Size | Typical Time |
|---|---|
| Small (1-50 paragraphs) | 2-5 seconds |
| Medium (50-200 paragraphs) | 5-15 seconds |
| Large (200-500 paragraphs) | 15-45 seconds |
| Very large (500+ paragraphs) | 45+ seconds |
Times are dominated by Pages automation overhead, not parsing.
-
Batch size tuning: Use
--batch-size Nto adjust paragraph flushing (default: 50). Larger batches may improve throughput for very large documents, but too large may cause memory issues. -
Quick smoke testing: Use
scripts/smoke_test.sh --quickto skip large fixtures during development. -
Non-blocking lock: Use
--no-waitto fail immediately if another conversion is running, instead of waiting. -
Parallel conversions: Not recommended due to Pages automation constraints. The default concurrency lock prevents this.
- The tool uses O(n) paragraph writing with buffered flushes
- Style lookups are cached to avoid repeated AppleScript calls
- Tables are processed in chunks (50 rows at a time)
- Memory usage is proportional to document size
Install Pages from the Mac App Store. The tool checks:
/Applications/Pages.app/System/Applications/Pages.app~/Applications/Pages.app
Grant Terminal (or your terminal app) permission to control Pages:
- System Settings → Privacy & Security → Automation → [Your Terminal] → Pages ✓
If the prompt doesn't appear:
- Open System Settings → Privacy & Security → Automation
- Click the + button
- Navigate to your terminal app
- Check "Pages" in the list
On macOS 26.1 and later, Apple's JXA (JavaScript for Automation) APIs for Pages have known issues. This tool automatically detects macOS 26+ and uses an HTML-based pipeline as a workaround.
What happens on macOS 26+:
- The tool uses: DOCX → JSON → HTML → Pages (open & export)
- Semantic HTML tags (
<h1>,<ul>,<table>) provide styled output - Template styles are NOT applied (Pages interprets HTML semantics instead)
- The
--templateargument is still required but not used in HTML mode
Output differences on macOS 26+:
- Headings, lists, and tables are styled semantically (not from your template)
- Visual appearance is correct but may not match your template exactly
- No style pollution occurs (HTML is converted fresh by Pages)
To force plain text mode (like v1.3.0 behavior):
docx2pages -i doc.docx -o out.pages -t template.pages --force-textRoot cause: Apple broke several JXA APIs in macOS 26+:
pages.open()returns null instead of a document referencedoc.paragraphStyles()throws "Can't convert types"doc.bodyText.charactersthrows unexpectedly
Note: This is an Apple platform limitation. We monitor macOS releases and will restore full template style support when these APIs are fixed.
The tool searches for scripts in this order:
--scripts-dir <path>if specified<executable>/../scripts/(dist package layout)./scripts/(current working directory)~/.docx2pages/scripts//usr/local/share/docx2pages/scripts/
If using the dist package, the wrapper script sets --scripts-dir automatically.
- Verify your template has the expected style names (Body, Heading 1, etc.)
- Run with
-vto see which styles are detected and mapped - Check that the template file is a valid .pages file (not .pages.zip)
- Add paragraph styles named "Bullet" and "Numbered" to your template
- Alternatively: "Bulleted List" / "Numbered List" or similar
- Run with
-vto see which list styles were detected
Another conversion is in progress. Options:
- Wait: By default, the tool waits for the lock to be released
- Fail fast: Use
--no-waitto fail immediately instead of waiting - Skip lock: Use
--no-lockto disable locking entirely (not recommended for parallel use)
docx2pages/
├── Package.swift # Swift package manifest
├── README.md # This file
├── CHANGELOG.md # Version history
├── CONTRIBUTING.md # Contribution guidelines
├── SECURITY.md # Security policy
├── LICENSE # MIT License
├── .github/
│ ├── workflows/
│ │ ├── ci.yml # Main CI (parser + build)
│ │ └── pages-integration.yml # Pages integration tests
│ ├── ISSUE_TEMPLATE/
│ │ ├── bug_report.md
│ │ └── feature_request.md
│ └── PULL_REQUEST_TEMPLATE.md
├── Sources/
│ └── docx2pages/
│ └── main.swift # CLI entry point and orchestration
├── scripts/
│ ├── parse_docx.py # DOCX parsing (Python)
│ ├── pages_writer.js # Pages automation (JXA)
│ ├── html_writer.py # JSON to HTML converter (macOS 26+ pipeline)
│ ├── html_to_pages.js # HTML to Pages exporter (macOS 26+ pipeline)
│ ├── create_fixtures.py # Test fixture generator
│ ├── test_parser_golden.py # Golden test runner
│ ├── smoke_test.sh # Integration smoke test
│ ├── package_dist.sh # Distribution packaging
│ ├── bump_version.sh # Version bump helper
│ └── release_checklist.md # Release process guide
└── fixtures/
├── *.docx # Test DOCX files
└── golden/ # Parser golden outputs
└── *.json
James KC Auchterlonie AuchShop LLC | MLNavigator Inc.
MIT License. See LICENSE file.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request