This project is a collection of utility scripts for working with public domain content, mostly from Project Gutenberg.
Requires pandoc
Converts an HTML book into a series of blog posts in Markdown, with YAML front matter for Hugo or similar SSG. You must pass --series which should be the title of the book. You can optionally pass --author and --date to be added to the front matter. If --date is not supplied, it will use today's date.
The script defaults to splitting at heading level 3 (H3). To split on a different heading level, e.g. H2, pass --split-level=2.
By default, the script will create an output directory in the current directory using the Series name. If you want the output to go elsewhere, pass --output_dir <PATH>.
The file tpl.html is the template for the chunked HTML output for pandoc. This is supplied in order to remove the navigation menu included in pandoc's default template.
Requires python-slugify to be installed. Recommend running with uv run.