Utility scripts for working with public domain texts

This project is a collection of utility scripts for working with public domain content, mostly from Project Gutenberg.

`convert_to_markdown.py`

Requires pandoc

Converts an HTML book into a series of blog posts in Markdown, with YAML front matter for Hugo or similar SSG. You must pass --series which should be the title of the book. You can optionally pass --author and --date to be added to the front matter. If --date is not supplied, it will use today's date.

The script defaults to splitting at heading level 3 (H3). To split on a different heading level, e.g. H2, pass --split-level=2.

By default, the script will create an output directory in the current directory using the Series name. If you want the output to go elsewhere, pass --output_dir <PATH>.

The file tpl.html is the template for the chunked HTML output for pandoc. This is supplied in order to remove the navigation menu included in pandoc's default template.

Requires python-slugify to be installed. Recommend running with uv run.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
convert_to_markdown.py		convert_to_markdown.py
tpl.html		tpl.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Utility scripts for working with public domain texts

`convert_to_markdown.py`

About

Uh oh!

Releases

Packages

Uh oh!

Languages

mindvessel/gutenberg

Folders and files

Latest commit

History

Repository files navigation

Utility scripts for working with public domain texts

convert_to_markdown.py

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

`convert_to_markdown.py`

Packages