Skip to content

Utility scripts for working with Project Gutenberg texts

Notifications You must be signed in to change notification settings

mindvessel/gutenberg

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Utility scripts for working with public domain texts

This project is a collection of utility scripts for working with public domain content, mostly from Project Gutenberg.

convert_to_markdown.py

Requires pandoc

Converts an HTML book into a series of blog posts in Markdown, with YAML front matter for Hugo or similar SSG. You must pass --series which should be the title of the book. You can optionally pass --author and --date to be added to the front matter. If --date is not supplied, it will use today's date.

The script defaults to splitting at heading level 3 (H3). To split on a different heading level, e.g. H2, pass --split-level=2.

By default, the script will create an output directory in the current directory using the Series name. If you want the output to go elsewhere, pass --output_dir <PATH>.

The file tpl.html is the template for the chunked HTML output for pandoc. This is supplied in order to remove the navigation menu included in pandoc's default template.

Requires python-slugify to be installed. Recommend running with uv run.

About

Utility scripts for working with Project Gutenberg texts

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published