Monter PDF Converter is a fast, developer-friendly API for turning an extensive range of document formats into high-quality PDFs. It's built to process multiple jobs concurrently, maintain high availability through health checks and automatic restarts, and simplify integration with a single conversion endpoint that automatically chooses the correct converter for each file.
It is used in production by Monter Leefstijl.
Monter PDF Converter uses Chromium, LibreOffice and Pandoc to convert to PDFs to faithfully reproduce the formatting and layout of the original documents.
- Chromium: For converting HTML files, Chromium is well-suited because it accurately renders HTML, CSS, and JavaScript, and supports modern web standards, ensuring that styles, fonts and other elements are preserved.
- LibreOffice: For conversion from Word, Excel, PowerPoint and other document formats, LibreOffice is well-suited because of its ability to produce clean, high-quality PDFs with proper formatting and layout.
- Pandoc: For conversion from other document formats, Pandoc is well-suited because of its versatility.
The application automatically selects the correct converter based on the inferred MIME-type, the extension and the given type.
- High-quality PDFs: Uses Chromium (for HTML), LibreOffice (for documents) and Pandoc (for other files) to produce high-quality PDFs.
- Multi-format support: Converts HTML, DOCX, PPTX, XLSX, and many other formats to PDF.
- Fast and concurrent: Handles multiple conversion jobs concurrently and queues additional jobs.
- Health checks: Provides a health check endpoint to monitor the service status.
- High-availability: Ensures high-availability by automatically restarting child services (i.e. Chromium and LibreOffice) in case of failure.
- Highly configurable: Allows customization of various settings through environment variables.
- Developer-friendly API: Provides a single endpoint for conversion, and automatically selects the correct converter.
- Uncomplicated: The entire source code is a single TypeScript file of ~1500 LOC.
The following endpoints are available:
POST /: Convert a document to PDF.GET /healthcheck: Check the health of the service.
curl --location 'http://localhost:8080' \
--form 'input=@"index.html"' \
--form 'resources=@"dog.jpg"' \
--form 'resources=@"cat.jpg"'The endpoint accept a multipart/form-data request with the following fields:
input(required): The document to convert.resources(optional): Additional resources for conversion.type(optional): The type of the given document.
The resources field can contain additional resources for conversion from .html and .xhtml that cannot be embedded in
the file itself. For example, if the file contains an image tag referencing dog.jpg, it may be included as a resource
to have it be displayed in the PDF. The type field can be used to specify the type of the input document whenever
this is ambiguous, such as for .txt files.
| File type | Common extensions | Converter |
|---|---|---|
biblatex (BibLaTeX) |
.biblatex |
Pandoc |
bibtex (BibTeX) |
.bib, .bibtex |
Pandoc |
bits (BITS XML) |
.xml |
Pandoc |
commonmark_x (CommonMark with extensions) |
.txt |
Pandoc |
commonmark (CommonMark) |
.commonmark |
Pandoc |
creole (Creole 1.0) |
.creole |
Pandoc |
csljson (CSL JSON) |
.csljson, .json |
Pandoc |
csv (CSV) |
.csv |
Pandoc |
djot (Djot) |
.dj |
Pandoc |
docbook (DocBook) |
.xml |
Pandoc |
docx (Microsoft Word) |
.docx |
LibreOffice |
dokuwiki (DokuWiki markup) |
.dokuwiki |
Pandoc |
endnotexml (EndNote XML) |
.xml |
Pandoc |
epub (EPUB) |
.epub |
Pandoc |
fb2 (FictionBook2) |
.fb2 |
Pandoc |
haddock (Haddock markup) |
.hs |
Pandoc |
html (HTML) |
.html, .htm |
Chromium |
ipynb (Jupyter Notebook) |
.ipynb |
Pandoc |
jats (JATS XML) |
.xml |
Pandoc |
json (JSON version of native AST) |
.json |
Pandoc |
latex (LaTeX) |
.tex |
Pandoc |
man (roff man) |
.man |
Pandoc |
markdown_mmd (MultiMarkdown) |
.mmd |
Pandoc |
markdown_strict (Original Markdown) |
.txt |
Pandoc |
markdown (Pandoc’s Markdown) |
.md, .markdown, .mkd, .mdown, .mkdn, .mdwn, .mdtxt, .mdtext, .text |
Pandoc |
mdoc (mdoc) |
.mdoc |
Pandoc |
mediawiki (MediaWiki markup) |
.wiki |
Pandoc |
muse (Emacs Muse) |
.muse |
Pandoc |
odt (OpenDocument Text) |
.odt |
LibreOffice |
opendocument (OpenDocument) |
.od* |
LibreOffice |
opml (OPML) |
.opml |
Pandoc |
org (Emacs Org mode) |
.org |
Pandoc |
pod (Perl POD) |
.pod |
Pandoc |
pptx (Microsoft PowerPoint) |
.pptx |
LibreOffice |
ris (RIS) |
.ris |
Pandoc |
rst (reStructuredText) |
.rst |
Pandoc |
rtf (Rich Text Format) |
.rtf |
LibreOffice |
t2t (txt2tags) |
.t2t |
Pandoc |
textile (Textile) |
.textile |
Pandoc |
tikiwiki (TikiWiki markup) |
.tiki |
Pandoc |
tsv (TSV) |
.tsv |
Pandoc |
twiki (TWiki markup) |
.twiki |
Pandoc |
typst (Typst) |
.typ |
Pandoc |
vimwiki (Vimwiki markup) |
.vimwiki |
Pandoc |
xlsx (Microsoft Excel) |
.xlsx |
LibreOffice |
The endpoint automatically detects the encoding, and handles the conversion if necessary.
200 OK: The document is successfully converted to a PDF. The response will contain the PDF file with content-typeapplication/pdf.400 Bad Request: The request is invalid (e.g. missing requiredinputfield).413 Payload Too Large: The document is too large to be processed.415 Unsupported Media Type: The document type is not supported, or is not correctly specified.502 Bad Gateway: The conversion failed in Chromium, LibreOffice or Pandoc.503 Service Unavailable: The request is denied because the job queue is full.504 Gateway Timeout: The conversion took too long to complete.
curl --location 'http://localhost:8080/healthcheck'The endpoint accepts a GET request with no parameters.
200 OK: The service is healthy.503 Service Unavailable: The service is unhealthy.
{
"health": {
"browser": "healthy",
"pandoc": "healthy",
"unoservers": {
"2003": "healthy",
"2004": "healthy",
"2005": "healthy",
"2006": "healthy",
"2007": "healthy",
"2008": "healthy",
"2009": "healthy",
"2010": "healthy",
"2011": "healthy",
"2012": "healthy",
"2013": "healthy",
"2014": "healthy"
},
"webserver": "healthy",
"jobQueue": "healthy"
}
}Below is a table of environment variables that can be used to configure the service.
| Name | Description | Default value |
|---|---|---|
WEBSERVER_PORT |
The port the web server listens on. | 8080 |
CHROME_EXECUTABLE_PATH |
The path to the Chrome executable. | /usr/bin/chromium-browser |
LIBREOFFICE_EXECUTABLE_PATH |
The path to the LibreOffice executable. | /usr/bin/libreoffice |
UNOSERVER_EXECUTABLE_PATH |
The path to the Unoserver executable. | /usr/bin/unoserver |
UNOCONVERT_EXECUTABLE_PATH |
The path to the Unoconvert executable. | /usr/bin/unoconvert |
UNOSERVER_LAUNCH_TIMEOUT |
The maximum time to wait in milliseconds for unoserver to launch. | 30000 (30 seconds) |
PANDOC_EXECUTABLE_PATH |
The path to the Pandoc executable. | /usr/bin/pandoc |
CHROME_LAUNCH_TIMEOUT |
The maximum time to wait in milliseconds for Chrome to launch. | 30000 (30 seconds) |
CHROME_RESTART_INTERVAL |
The interval in milliseconds to restart the browser. | 86400000 (1 day) |
PDF_RENDER_TIMEOUT |
The maximum time to spend rendering a single PDF. | 150000 (2.5 minutes) |
MAX_FILE_SIZE |
The maximum size in bytes for each uploaded file. | 134217728 (128 MB) |
MAX_CONCURRENT_JOBS |
The maximum number of concurrent jobs. Settings this to a high value may cause unexpected behaviour. | 6 |
MAX_QUEUED_JOBS |
The maximum number of jobs in the queue. | 128 |
MAX_RESOURCE_COUNT |
The maximum number of resources (e.g. images) that can be uploaded. | 16 |
MAX_RESTARTS |
The maximum number of times processes can be restarted within 60 seconds before giving up. | 3 |
RESTART_DELAY |
The interval in milliseconds to wait before restarting subprocesses. | 5000 (5 seconds) |
Important
It is not recommended to expose Monter PDF Converter to the world directly, unless you take proper precautions (such as sandboxing or using a separate server).
- Follow this guide to install Docker on your machine.
- Follow this guide to install Docker Compose on your machine.
- Copy the
docker-compose.sample.yamlin this repository and rename it todocker-compose.yaml. - Configure the port mapping as necessary in the
docker-compose.yamlfile. - Start the service by running the following command:
docker compose up -d
- Done! Navigate to http://localhost:1337/healthcheck to check if everything works. It may take some time for everything to become healthy.
Contributions are welcome! Please submit a pull request or open an issue to discuss any changes.
This project is licensed under the MIT License. See the LICENSE file for details.
For any questions or support, please open an issue on GitHub.
For comparison, consider exploring these alternatives:
- Gotenberg: More comphrensive API, but with less supported file types and no native support for concurrent jobs.
- wkhtmltopdf: Command-line tool to convert HTML to PDF using WebKit.
- puppeteer-html-to-pdf-converter: Simple API for converting from HTML to PDF.