Skip to content

feat: Playwright PDF engine (allow the PDF engine to be configured in the mkdocs.yml)#58

Open
hkato wants to merge 29 commits intodevelopfrom
impl-playwright-pdf-engine
Open

feat: Playwright PDF engine (allow the PDF engine to be configured in the mkdocs.yml)#58
hkato wants to merge 29 commits intodevelopfrom
impl-playwright-pdf-engine

Conversation

@hkato
Copy link
Copy Markdown
Collaborator

@hkato hkato commented Mar 22, 2025

ref. #51

Experimental feature

  • This change has no side effects on the WeasyPrint PDF engine.
  • headless_chrome_path option defaults to None
    • Playwright bundles its own Chromium binary : None
    • When Playwright uses an external Chromium, it requires an absolute path
plugins:
  - to-pdf:
      pdf_engine: chromium                                 # default: weasyprint
      headless_chrome_path: /usr/bin/chromium-browser      # default: None
      render_js: true                                      # default: None

Install

python -m venv .venv
source .venv/bin/activate
pip install mkdocs-material git+https://github.com/domWalters/mkdocs-to-pdf.git@impl-playwright-pdf-engine

By default, Playwright requires the installation of its own Chromium

root@1719ac0fad24:/docs# playwright install chromium --only-shell
Downloading Chromium Headless Shell 134.0.6998.35 (playwright build v1161) from https://cdn.playwright.dev/dbazure/download/playwright/builds/chromium/1161/chromium-headless-shell-linux.zip
100.9 MiB [====================] 100% 0.0s
Chromium Headless Shell 134.0.6998.35 (playwright build v1161) downloaded to /root/.cache/ms-playwright/chromium_headless_shell-1161
Downloading FFMPEG playwright build v1011 from https://cdn.playwright.dev/dbazure/download/playwright/builds/ffmpeg/1011/ffmpeg-linux.zip
2.3 MiB [====================] 100% 0.0s
FFMPEG playwright build v1011 downloaded to /root/.cache/ms-playwright/ffmpeg-1011

Otherwise, please specify the absolute path to the Chromium binary.

plugins:
  - to-pdf:
      pdf_engine: chromium
      # Chromium on Ubuntu/Alma/Alpine
      headless_chrome_path: /usr/bin/chromium-browser
      # Chromium on Debian/Arch
      #headless_chrome_path: /usr/bin/chromium
      # Google Chrome on Linux(deb/rpm)
      #headless_chrome_path: /usr/bin/google-chrome
      # Google Chrome on macOS
      #headless_chrome_path: '/Applications/Google Chrome.app/Contents/MacOS/Google Chrome'
      # Google Chrome on Windows
      #headless_chrome_path: 'C:/Program Files/Google/Chrome/Application/chrome.exe'
      # Microsoft Edge on Windows
      #headless_chrome_path: 'C:/Program Files (x86)/Microsoft/Edge/Application/msedge.exe'

Code changed

  • Added: Chromium-based headless browser PDF engine implementation.
  • Changed: render_js impl. from subprocess to Playwright
  • Changed: WeasyPrint dependencies removed from the main routine
    • WeasyPrint's URL/IRI utilities replaced with its own urllib-based utilities.
      • weasyprint.urls.url_is_absolute -> preprocessor.link.util.is_absolute_url
      • weasyprint.urls.iri_to_uri -> preprocessor.link.util.iri_to_uri
      • Unit Test for utilities

Test

pytest

$ uv run pytest . -vv
============================================================= test session starts =============================================================
platform darwin -- Python 3.9.21, pytest-8.3.5, pluggy-1.5.0 -- /Users/hideyuki/Workspaces/github.com/domWalters/mkdocs-to-pdf/.venv/bin/python3
cachedir: .pytest_cache
rootdir: /Users/hideyuki/Workspaces/github.com/domWalters/mkdocs-to-pdf
configfile: pyproject.toml
collected 3 items                                                                                                                             

tests/mkdocs_to_pdf/preprocessor/links/test_util.py::test_is_absolute_url PASSED                                                        [ 33%]
tests/mkdocs_to_pdf/preprocessor/links/test_util.py::test_iri_to_uri PASSED                                                             [ 66%]
tests/test_links.py::TransformHrefTestCase::test_transform_href PASSED                                                                  [100%]

============================================================== 3 passed in 0.24s ==============================================================

PDF check(a little bit)

  • macOS 15.3.2 24D81

  • Windows 10 22H2

  • Docker container:

    • Debian GNU/Linux 12 (bookworm)
    • Ubuntu 24.04.1 LTS
    • AlmaLinux release 9.5 (Teal Serval) (Playwright does not support this)
    • Arch Linux 20250316.0.322463 (Playwright does not support this)
    • Alpine Linux v3.21 (Playwright does not support this) with Node.js binary replaced

Note

@hkato hkato requested a review from domWalters March 22, 2025 11:20
@hkato hkato self-assigned this Mar 22, 2025
hkato added 3 commits March 22, 2025 23:51
Operating Systems that install chromium using snap such as Ubuntu cannot open files from /tmp
- Playwright bundles its own Chromium binary : None
- When Playwright uses an external Chromium, it requires an absolute path
@domWalters domWalters added this to the v0.11.0 milestone Mar 22, 2025
@domWalters domWalters marked this pull request as ready for review March 22, 2025 18:49
@hkato
Copy link
Copy Markdown
Collaborator Author

hkato commented Mar 23, 2025

@domWalters

Please test whether the generated PDFs are fine or not.
Could you review and merge this into the 'develop' branch as an experimental feature?

@hkato hkato added the enhancement New feature or request label Mar 26, 2025
@domWalters domWalters linked an issue Apr 29, 2025 that may be closed by this pull request
@domWalters
Copy link
Copy Markdown
Owner

This is gonna be the next thing I look at.

Hopefully, I'll have Sunday afternoon to do this.

@hkato
Copy link
Copy Markdown
Collaborator Author

hkato commented May 1, 2025

@domWalters

I've noticed a critical issue: page numbers are not appearing in the TOC, and neither are the author and copyright information. This seems to be a limitation of Chrome.

Looking at other implementations (Vivliostyle), it appears they perform the PDF conversion using Chrome/Playwright and implement these features as a post-processing step.

Given that this is an insufficient implementation, should we consider withdrawing it for now? Or should we proceed with it as an experimental implementation with the aim of implementing post-processing in the future?

I'd like to integrate the insufficient Chrome-based PDF conversion as a hidden option, and then implement post-processing (adding page numbers to the TOC) as the next step.

@domWalters domWalters linked an issue Jul 22, 2025 that may be closed by this pull request
@domWalters domWalters linked an issue Jul 22, 2025 that may be closed by this pull request
@luminoso
Copy link
Copy Markdown

luminoso commented Dec 5, 2025

Thank you. been testing this branch and other than a few glitches it is working very well. a lot better than the default engine. Any expectations of a merge?

@jernejfrank
Copy link
Copy Markdown

Thanks, +1 on getting this merged. Have also been using this branch instead of the pip package for a while and can confirm it behaves nicer than the default engine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

4 participants