Skip to content

tillt/ChapterForge

Repository files navigation

ChapterForge

License: MIT Build Tests Docs Homebrew

ChapterForge is a library and CLI to mux chapters (text and optional images) into AAC/M4A files while preserving metadata and handling Apple-compatible chapter tracks.

Table of Contents

Motivation and Backstory

MPEG4 and its container MP4 standard do not explicitly describe chapter marks. Apple however did, implicitly.

Back in the days, under the umbrella of the QuickTime.framework, Apple had released authoring tools for audiobooks. With those tools you could add jump-marks to an M4A file. Those jump-marks could have a description text, an image and possibly more. That way a user could conveniently jump to specific sections of the audio -- useful for example as a mark for chapters. There are many players that understand those, but not all do. Specifically Apple who has pushed for this "extension" of the standard, has traditionally been understanding it in their players. That is true for Music.app, iTunes.app, QuickTime.app and even Books.app. All of them today support chapter marks. Windows has existing support there as well. That said it becomes clear, this is not standard but at least a functional solution for the challenge of encoding such thing into the audio file.

Apple did support authoring tools - but that was way back in PowerPC times. The QTKit is long gone. There appears to be not a single open source tool in the market that would support a recent OS. There are tools like ffmpeg - it does even support chapter marks for MP4 - but - no images for those. The only tool in the market supporting images in chapter marks in 2025 appears to be Auphonic - commercial.

All the existing libraries offered to application developers, even the big players like Bento4 do not support chapter marks with images. AVFoundation, the framework Apple offers these days for media playback and authoring does fully support reading of MP4 chapter marks including images. Thus creating a player supporting that feature is trivial. The kicker here is, Apple does not support any way of authoring / writing / creating such files - none at all.

No one had a strong enough interest to change this, until today.

My tinker project needs support for storing a track-list / set-list in the file itself. That way I can attribute those beautiful DJ sets and have neat track-mark thumbnails and descriptions on the player, persisted in the M4A file. A few thousand lines of code later, we have a new library based on no other works available which does the job for me - maybe also for you.

Features

ChapterForge uses the audio track from the input. It then combines that with a text track for the description, an optional text track for per-chapter URLs, and a video track for optional chapter images. All of that information gets bundled in the resulting output M4A file. With that M4A file you can now see chapter marks in your player.

Platforms

Supported players (just a selection of known goods):

text image url url-text
QuickTime.app X X o o
Music.app X X o o
Books.app X X o o
VLC X o o o

(X = full support and display, o = not displayed, all other functions remain)

Text commonly is displayed as the chapter title. Image commonly is displayed as a thumbnail. In QuickTime we additionally get a "movie" presented - a very nice experience. URL commonly is displayed nowhere. URL text commonly is displayed nowhere.

Note that AVFoundation supports all of those attributes for parsing and extraction - on macOS and iOS it is therefor trivial to support them.

Example output to validate players

We ship two reference files you can open in your favorite player to sanity-check chapter handling:

  • Input JSON: testdata/chapters_10s_2ch_normalimg_meta.json
    • 2 chapters at 0 and 5 seconds (fits the 10s input)
    • Cover: images/cover_normal.jpg
    • Per-chapter images: images/normal*.jpg
    • URL track with per-chapter HREFs
  • Built example: ChapterForge Example M4A File

What to expect:

  • Chapter list shows the 2 entries with titles and thumbnails.
  • Jumps land at the correct 5s offsets.
  • QuickTime shows the video track for chapter images; Music.app shows thumbnails.
  • URLs are present in the dedicated URL track (AVFoundation surfaces them via extraAttributes[HREF]), but players generally do not display them.

Bonus: ChapterForge Bonus Track M4A File — 50 chapters, small images, and per-chapter URLs to stress-test players.

QuickTime.app on macOS playing our example file:

QuickTime displays chapters

Music.app on macOS playing our example file:

Music.app displays chapters

Do you have more players showing our example file? Would be great to see them.

Installing

macOS (Homebrew tap)

brew tap tillt/chapterforge https://github.com/tillt/ChapterForge.git
brew install --HEAD tillt/chapterforge/chapterforge

This builds the CLI and static library (and the universal macOS framework when enabled). Head-only for now; tags will become bottled when we ship stable releases.

Windows

  • Prebuilt release zip: grab the latest draft/release artifact from GitHub. Inside you’ll find chapterforge_cli.exe, chapterforge.lib, and include/.
  • Build from source (MSVC):
    cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
    cmake --build build --config Release
    Outputs land in build/Release/.

Linux

  • Prebuilt tarball: download the release tarball (chapterforge-<ver>-ubuntu-latest.tar.gz) and use chapterforge_cli + libchapterforge.a under include/.
  • Build from source:
    cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
    cmake --build build
    Ensure dependencies: CMake, a C++20 compiler, and the test tools if you run ctest (mp4info/mp4dump from Bento4, AtomicParsley, xxd, gpac/mp4box optional).

vcpkg (overlay port)

Add this repo as an overlay (or copy ports/chapterforge into your overlays), then:

vcpkg install chapterforge

The overlay tracks the current commit. Update REF/SHA512 in ports/chapterforge/portfile.cmake when you bump versions.

CLI Usage

./chapterforge_cli <input.m4a|.mp4|.aac> <chapters.json> <output.m4a>
./chapterforge_cli <input.m4a> [--export-jpegs DIR]                     # read/extract
./chapterforge_cli --version
  • Write mode: mux chapters/images/URLs into an output M4A. If the input already has metadata (ilst), it is reused by default. Fast-start is ON by default (moov before mdat); use --no-faststart if you need the legacy layout.
  • Read mode: extract metadata, chapter titles/URLs/URL-texts, and images from an M4A. The JSON emitted matches the writer input format and is always printed to stdout. Use --export-jpegs DIR to dump cover
    • chapter images alongside the JSON and reference them in the output.
  • Logging: defaults to version + warnings/errors. Set verbosity when embedding via chapterforge::set_log_verbosity(LogVerbosity::Warn|Info|Debug) or pass --log-level warn|info|debug to the CLI. Debug-only logs stay hidden unless you raise the level.
  • Options:
    • --faststart (write) Explicitly enable fast-start (default).
    • --no-faststart (write) Disable fast-start; keep mdat before moov.
    • --log-level LEVEL One of warn|info|debug.
    • --export-jpegs DIR (read) Export cover/chapter JPEGs to DIR and reference them in the JSON.

Chapters JSON format

ChapterForge consumes a simple JSON document:

{
  "title": "Sample Podcast Episode",     // optional top-level metadata
  "artist": "John Doe",
  "album": "My Podcast",
  "genre": "Podcast",
  "year": "2024",
  "comment": "Created with ChapterForge",
  "cover": "cover.jpg",                  // optional; path is relative to the JSON file

  "chapters": [
    {
      "title": "Introduction",           // required
      "start_ms": 0,                     // required: chapter start time in milliseconds (first snaps to 0)
      "image": "chapter1.jpg",           // optional; path relative to the JSON file
      "url": "https://example.com",      // optional; creates a URL text track with HREF
      "url_text": "Intro link label"     // optional; text payload for the URL track (defaults empty)
    },
    {
      "title": "Main Discussion",
      "start_ms": 10000,
      "image": "chapter2.jpg",
      "url": ""
    }
  ]
}

Notes:

  • Chapters are positioned by absolute start times (start_ms). Apple family players (QuickTime, Music.app, AVFoundation) and VLC snap the first chapter to 0 even if you author a non-zero start_ms. We warn on non-zero first starts; if you truly need a gap, add an explicit leading “blank” chapter covering 0–gap_ms.
  • Chapter images are optional; omit image to create a text-only chapter.
  • URL track text: url_text is optional and defaults to empty (Apple-authored behavior). If set, it travels in the URL tx3g samples; some players may surface it as visible text.
  • Chapter URLs are optional; omit url to skip the URL track entirely.
  • If top-level metadata fields are omitted and the input file already contains metadata (ilst), that metadata is preserved automatically.
  • Paths for cover and per-chapter image are resolved relative to the JSON file location.

First chapter behavior (Apple/VLC) The chapter tracks are duration-based (stts), but most players force the first sample to start at t=0. A non-zero first start_ms will be snapped to 0 in QuickTime, Music.app, AVFoundation, and VLC. If you need silence/blank time before your “real” first chapter, add a leading placeholder chapter that covers 0..gap_ms and then start your first “real” chapter after that.

Output

Atom flow (input → output)

ChapterForge preserves the source audio track and metadata (ilst) and adds up to three new tracks for chapters:

Input (AAC in M4A/MP4)
├─ ftyp
├─ free (optional)
├─ moov
│  ├─ mvhd
│  ├─ trak (audio)
│  │  ├─ tkhd
│  │  └─ mdia → minf → stbl (reused, including stsd/stts/stsc/stsz/stco)
│  └─ udta/meta/ilst (reused if present)
└─ mdat

Output (ChapterForge)
├─ ftyp
├─ free (optional or moved for faststart)
├─ moov
│  ├─ mvhd
│  ├─ trak (audio, reused stbl when input is MP4/M4A)
│  ├─ trak (chapter titles, tx3g)
│  │  └─ stbl with stsd(tx3g) + stts/stsc/stsz/stco
│  ├─ trak (chapter URLs, tx3g with href) [only if any chapter has `url` or `url_text`]
│  │  └─ same structure as titles; text may be empty, href carries the URL
│  ├─ trak (chapter images, jpeg)
│  │  └─ stbl with stsd(jpeg) + stts/stsc/stsz/stco/stss
│  └─ udta/meta/ilst (reused if present, otherwise from JSON)
└─ mdat (audio + chapter samples)

Fast-start repacks moov ahead of mdat when requested.

These settings mirror Apple-authored “golden” files so that QuickTime, Music.app, and AVFoundation surface titles, URLs, and thumbnails reliably.

Technical Breakdown

  • Track references (tref/chap): audio track points only to the title text track and the image track (when present); the URL track is deliberately not referenced. This matches Apple-authored files and keeps QuickTime showing titles while Music.app shows thumbnails.
  • HREF propagation: every chapter URL is mirrored into the title track payload as well, which makes AVFoundation expose it in extraAttributes[HREF].
  • Timescales: text/url/image tracks use 1000 Hz; audio timescale is preserved from the source. Track IDs may differ; structure/flags/handlers remain.
  • Chapter images: the video track dimensions come from the first JPEG; keep all chapter images the same size (and yuvj420p) so every frame displays in QuickTime/Music. Dimension mismatches emit a mux-time warning and may hide later frames.

Chapter track reference (titles, URLs, images)

trak (titles)
  tkhd flags=1, alt_group=1, id=2
  hdlr type='text', name='Chapter Titles'
  mdia
    mdhd timescale=1000
    hdlr text
    minf/nmhd
      stbl
        stsd -> tx3g sample entry (see "title and url as tx3g sample")
        stts: one entry per sample, sample_count = chapter_count
        stsc: 3 entries, 1 sample per chunk
        stsz: per-sample sizes (chapter_count)
        stco: chunk offsets (chapter_count)

trak (URLs, only if any chapter has `url`)
  tkhd flags=1, alt_group=1, id=3
  hdlr type='text', name='Chapter URLs'
  mdia/mdhd timescale=1000
  stbl mirrors titles; samples carry `href` box:
    sample = [len][utf8 text (often empty)][href box]
    href box: size=0x1a, type='href', start=0, end=0x000a, url_len, url bytes, pad

trak (images)
  tkhd flags=7, id=4, width/height set from first JPEG
  hdlr type='vide', name='Chapter Images'
  mdia/mdhd timescale=1000
  stsd jpeg sample entry
  stts/stsc/stsz/stco sized to number of images; stss marks every sample as sync

title and url as tx3g sample entry

  • stsd tx3g sample entry matches Apple/golden layout:
    • displayFlags: 0x00000000
    • justification: 0x01FF (horizontal: center, vertical: baseline)
    • bg color: 0x1f1f1f00 (RGBA: dark gray, fully transparent)
    • default style: start=0, end=0, fontID=1, face=1, size=0x12, color=000000FF (RGBA: black, opaque)
    • font table: single entry “Sans-Serif”
  • Text samples: [len][utf8 text][href box?] where href box is size=0x1a type=href start=0 end=0x000a len url pad.
  • URL track (tx3g with href): same sample entry; samples may have empty text, href drives AVFoundation’s extraAttributes[HREF].

image as MJPEG sample entry

  • Every sample is sync-marked (stss), timescale 1000. Use baseline JPEG yuvj420p; 4:4:4 art can blank thumbnails in QuickTime/Music. If you supply art, re-encode with:
    ffmpeg -y -i your_art.jpg -pix_fmt yuvj420p your_art_420.jpg

Building

cmake -S . -B build
cmake --build build

Targets:

  • chapterforge — static library
  • chapterforge_cli — command-line tool

Notes:

  • Requires a compiler with C++20 support.
  • Fast-start is on by default.
  • Tests and docs are optional targets (see below).
  • To make AVFoundation surface extraAttributes[HREF] consistently, ChapterForge mirrors each URL into both the URL track and the title track sample text; players still show the title normally, while AVFoundation exposes the HREF.

Embedding API (C++)

Public header: chapterforge.hpp

struct ChapterTextSample {
    std::string text;       // UTF-8 text
    std::string href;       // optional hyperlink URL (tx3g modifier)
    uint32_t start_ms = 0;  // absolute start time in ms
};

struct ChapterImageSample {
  std::vector<uint8_t> data; // JPEG bytes for this chapter frame
  uint32_t start_ms = 0;     // absolute start time in milliseconds
};

// Status + message on failure.
struct Status { bool ok; std::string message; };

// Writes top level metadata (optional), chapters with titles, optional URLs and chapter images.
Status mux_file_to_m4a(const std::string& input_audio_path,
                       const std::vector<ChapterTextSample>& text_chapters,
                       const std::vector<ChapterTextSample>& url_chapters,
                       const std::vector<ChapterImageSample>& image_chapters,
                       const MetadataSet& metadata,
                       const std::string& output_path,
                       bool fast_start = true);

Note that there are several overloads for mux_file_to_m4a, for your convenience and clear intent.

// Result from reading and parsing an MP4/M4A file.
struct ReadResult {
    Status status;
    std::vector<ChapterTextSample> text_chapters;
    std::vector<ChapterTextSample> url_chapters;
    std::vector<ChapterImageSample> image_chapters;
    MetadataSet metadata;
};

// Extracts chapter titles, optional URL samples (tx3g + href), chapter images (MJPEG samples),
// and top-level metadata (ilst if present). Does not decode audio.
ReadResult read_m4a(const std::string &path);

If metadata is empty and the source has an ilst, it is reused automatically.

Minimal C++ usage (CLI equivalent)

The CLI front-end is effectively:

#include "chapterforge.hpp"
#include <iostream>

int main(int argc, char** argv) {
  if (argc != 4) {
    std::cerr << "usage: chapterforge <input.m4a|.mp4|.aac> <chapters.json> <output.m4a>\n";
    return 2;
  }
  std::string input   = argv[1];
  std::string chapters= argv[2];
  std::string output  = argv[3];

  auto status = chapterforge::mux_file_to_m4a(input, chapters, output);
  if (!status.ok) {
    std::cerr << "chapterforge: failed to write output: " << status.message << "\n";
    return 1;
  }
  std::cout << "Wrote: " << output << "\n";
  return 0;
}

Use the higher-level overload if you already have chapters/material in memory and don’t want to read JSON on disk.

Reading (extract chapters/metadata/images) mirrors the CLI read mode:

auto res = chapterforge::read_m4a("input.m4a");
if (!res.status.ok) {
  std::cerr << "read failed: " << res.status.message << "\n";
} else {
  std::cout << "title: " << res.metadata.title << "\n";
  for (const auto& c : res.titles) {
    std::cout << c.start_ms << " ms -> " << c.text << " href=" << c.href << "\n";
  }
}

read_m4a returns:

  • statusok/message pair.
  • titles — chapter title samples (tx3g), including href mirrored from the URL track when present.
  • urls — optional URL track samples (tx3g + href). Empty if no URL track exists.
  • images — optional JPEG chapter images.
  • metadata — top-level ilst metadata (reused from source; empty if absent).

Note: When reading, missing fields are left empty rather than synthesized (e.g., a chapter without a URL will have an empty URL sample and no url/url_text keys in the exported JSON).

Tests & Dependencies

Quick run:

cmake -S . -B build -DENABLE_OUTPUT_TOOL_TESTS=ON
cmake --build build
cd build
ctest --output-on-failure

Optional toggles (configure-time):

  • -DENABLE_BIG_IMAGE_TESTS=ON — heavy image/long-duration fixtures (needs input_big.m4a + large JPEGs).
  • -DENABLE_STRICT_VALIDATION=ON — extra tool-based checks (mp4info/mp4dump/AtomicParsley/ffprobe/MP4Box).
  • -DENABLE_AVFOUNDATION_SMOKE=ON — macOS Swift smoke test (needs swift).

Tooling deps (used only by tooling-labeled tests):

  • Bento4 mp4info/mp4dump (JSON parsing for audio/atom checks)
  • AtomicParsley (atom tree inspection)
  • gpac (MP4Box) and ffprobe (strict validation, optional)
  • xxd (hex dumps for atom offset checks)
  • say (macOS only; optional for synthetic audio generation)

Notes:

  • Core tests (label core) require only the compiler/runtime.
  • Tooling tests are optional and run only when deps are present; skip by leaving ENABLE_OUTPUT_TOOL_TESTS=OFF.
  • CI installs these per-platform; local runs can be minimal.

Contributing

Issues and PRs are welcome. Please:

  • Keep changes ASCII unless the file already uses Unicode.
  • Run tests before submitting: cmake -S . -B build -DENABLE_OUTPUT_TOOL_TESTS=ON && cmake --build build && (cd build && ctest --output-on-failure).
  • Add or update tests when you change muxing behavior, metadata handling, or JSON parsing.
  • Keep comments concise and only where the code isn’t self-explanatory.

Disclaimer

This is anything but a reference implementation. Many shortcuts were taken to reach the goal. There are plenty of hardcoded, magic bytes in this project and the parsers may explode with the next file you provide to them. If you need enterprise grade, this is not the library for you. If you need something similar but not exactly what this does, you better are a developer ready to contribute when contacting me as I have no interest in working for you.

About

ChapterForge is a library and CLI to mux Apple compatible chapters (text and optional images) into M4A.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published