ChapterForge is a library and CLI to mux chapters (text and optional images) into AAC/M4A files while preserving metadata and handling Apple-compatible chapter tracks.
- Motivation and Backstory
- Features
- Platforms
- Installing
- CLI Usage
- Chapters JSON format
- Output
- Building
- API Documentation on GitHub
- Embedding API (C++)
- Minimal C++ usage (CLI equivalent)
- Tests & Dependencies
- Contributing
- Advanced Usage
- Disclaimer
MPEG4 and its container MP4 standard do not explicitly describe chapter marks. Apple however did, implicitly.
Back in the days, under the umbrella of the QuickTime.framework, Apple had released authoring tools for audiobooks. With those tools you could add jump-marks to an M4A file. Those jump-marks could have a description text, an image and possibly more. That way a user could conveniently jump to specific sections of the audio -- useful for example as a mark for chapters. There are many players that understand those, but not all do. Specifically Apple who has pushed for this "extension" of the standard, has traditionally been understanding it in their players. That is true for Music.app, iTunes.app, QuickTime.app and even Books.app. All of them today support chapter marks. Windows has existing support there as well. That said it becomes clear, this is not standard but at least a functional solution for the challenge of encoding such thing into the audio file.
Apple did support authoring tools - but that was way back in PowerPC times. The QTKit is long gone. There appears to be not a single open source tool in the market that would support a recent OS. There are tools like ffmpeg - it does even support chapter marks for MP4 - but - no images for those. The only tool in the market supporting images in chapter marks in 2025 appears to be Auphonic - commercial.
All the existing libraries offered to application developers, even the big players like Bento4 do not support chapter marks with images. AVFoundation, the framework Apple offers these days for media playback and authoring does fully support reading of MP4 chapter marks including images. Thus creating a player supporting that feature is trivial. The kicker here is, Apple does not support any way of authoring / writing / creating such files - none at all.
No one had a strong enough interest to change this, until today.
My tinker project needs support for storing a track-list / set-list in the file itself. That way I can attribute those beautiful DJ sets and have neat track-mark thumbnails and descriptions on the player, persisted in the M4A file. A few thousand lines of code later, we have a new library based on no other works available which does the job for me - maybe also for you.
ChapterForge uses the audio track from the input. It then combines that with a text track for the description, an optional text track for per-chapter URLs, and a video track for optional chapter images. All of that information gets bundled in the resulting output M4A file. With that M4A file you can now see chapter marks in your player.
Supported players (just a selection of known goods):
| text | image | url | url-text | |
|---|---|---|---|---|
| QuickTime.app | X | X | o | o |
| Music.app | X | X | o | o |
| Books.app | X | X | o | o |
| VLC | X | o | o | o |
(X = full support and display, o = not displayed, all other functions remain)
Text commonly is displayed as the chapter title. Image commonly is displayed as a thumbnail. In QuickTime we additionally get a "movie" presented - a very nice experience. URL commonly is displayed nowhere. URL text commonly is displayed nowhere.
Note that AVFoundation supports all of those attributes for parsing and extraction - on macOS and iOS it is therefor trivial to support them.
We ship two reference files you can open in your favorite player to sanity-check chapter handling:
- Input JSON:
testdata/chapters_10s_2ch_normalimg_meta.json- 2 chapters at 0 and 5 seconds (fits the 10s input)
- Cover:
images/cover_normal.jpg - Per-chapter images:
images/normal*.jpg - URL track with per-chapter HREFs
- Built example: ChapterForge Example M4A File
What to expect:
- Chapter list shows the 2 entries with titles and thumbnails.
- Jumps land at the correct 5s offsets.
- QuickTime shows the video track for chapter images; Music.app shows thumbnails.
- URLs are present in the dedicated URL track (AVFoundation surfaces them via
extraAttributes[HREF]), but players generally do not display them.
Bonus: ChapterForge Bonus Track M4A File — 50 chapters, small images, and per-chapter URLs to stress-test players.
QuickTime.app on macOS playing our example file:
Music.app on macOS playing our example file:
Do you have more players showing our example file? Would be great to see them.
brew tap tillt/chapterforge https://github.com/tillt/ChapterForge.git
brew install --HEAD tillt/chapterforge/chapterforgeThis builds the CLI and static library (and the universal macOS framework when enabled). Head-only for now; tags will become bottled when we ship stable releases.
- Prebuilt release zip: grab the latest draft/release artifact from GitHub. Inside you’ll find
chapterforge_cli.exe,chapterforge.lib, andinclude/. - Build from source (MSVC):
Outputs land in
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release cmake --build build --config Releasebuild/Release/.
- Prebuilt tarball: download the release tarball (
chapterforge-<ver>-ubuntu-latest.tar.gz) and usechapterforge_cli+libchapterforge.aunderinclude/. - Build from source:
Ensure dependencies: CMake, a C++20 compiler, and the test tools if you run
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release cmake --build buildctest(mp4info/mp4dump from Bento4, AtomicParsley, xxd, gpac/mp4box optional).
Add this repo as an overlay (or copy ports/chapterforge into your overlays), then:
vcpkg install chapterforgeThe overlay tracks the current commit. Update REF/SHA512 in ports/chapterforge/portfile.cmake when you bump versions.
./chapterforge_cli <input.m4a|.mp4|.aac> <chapters.json> <output.m4a>
./chapterforge_cli <input.m4a> [--export-jpegs DIR] # read/extract
./chapterforge_cli --version- Write mode: mux chapters/images/URLs into an output M4A. If the input already has metadata (
ilst), it is reused by default. Fast-start is ON by default (moov before mdat); use--no-faststartif you need the legacy layout. - Read mode: extract metadata, chapter titles/URLs/URL-texts, and images from an M4A. The JSON emitted
matches the writer input format and is always printed to stdout. Use
--export-jpegs DIRto dump cover- chapter images alongside the JSON and reference them in the output.
- Logging: defaults to version + warnings/errors. Set verbosity when embedding via
chapterforge::set_log_verbosity(LogVerbosity::Warn|Info|Debug)or pass--log-level warn|info|debugto the CLI. Debug-only logs stay hidden unless you raise the level. - Options:
--faststart(write) Explicitly enable fast-start (default).--no-faststart(write) Disable fast-start; keepmdatbeforemoov.--log-level LEVELOne ofwarn|info|debug.--export-jpegs DIR(read) Export cover/chapter JPEGs toDIRand reference them in the JSON.
ChapterForge consumes a simple JSON document:
Notes:
- Chapters are positioned by absolute start times (
start_ms). Apple family players (QuickTime, Music.app, AVFoundation) and VLC snap the first chapter to 0 even if you author a non-zerostart_ms. We warn on non-zero first starts; if you truly need a gap, add an explicit leading “blank” chapter covering 0–gap_ms. - Chapter images are optional; omit
imageto create a text-only chapter. - URL track text:
url_textis optional and defaults to empty (Apple-authored behavior). If set, it travels in the URL tx3g samples; some players may surface it as visible text. - Chapter URLs are optional; omit
urlto skip the URL track entirely. - If top-level metadata fields are omitted and the input file already contains metadata (
ilst), that metadata is preserved automatically. - Paths for
coverand per-chapterimageare resolved relative to the JSON file location.
First chapter behavior (Apple/VLC) The chapter tracks are duration-based (
stts), but most players force the first sample to start at t=0. A non-zero firststart_mswill be snapped to 0 in QuickTime, Music.app, AVFoundation, and VLC. If you need silence/blank time before your “real” first chapter, add a leading placeholder chapter that covers 0..gap_ms and then start your first “real” chapter after that.
ChapterForge preserves the source audio track and metadata (ilst) and adds up to three new tracks for chapters:
Input (AAC in M4A/MP4)
├─ ftyp
├─ free (optional)
├─ moov
│ ├─ mvhd
│ ├─ trak (audio)
│ │ ├─ tkhd
│ │ └─ mdia → minf → stbl (reused, including stsd/stts/stsc/stsz/stco)
│ └─ udta/meta/ilst (reused if present)
└─ mdat
Output (ChapterForge)
├─ ftyp
├─ free (optional or moved for faststart)
├─ moov
│ ├─ mvhd
│ ├─ trak (audio, reused stbl when input is MP4/M4A)
│ ├─ trak (chapter titles, tx3g)
│ │ └─ stbl with stsd(tx3g) + stts/stsc/stsz/stco
│ ├─ trak (chapter URLs, tx3g with href) [only if any chapter has `url` or `url_text`]
│ │ └─ same structure as titles; text may be empty, href carries the URL
│ ├─ trak (chapter images, jpeg)
│ │ └─ stbl with stsd(jpeg) + stts/stsc/stsz/stco/stss
│ └─ udta/meta/ilst (reused if present, otherwise from JSON)
└─ mdat (audio + chapter samples)
Fast-start repacks moov ahead of mdat when requested.
These settings mirror Apple-authored “golden” files so that QuickTime, Music.app, and AVFoundation surface titles, URLs, and thumbnails reliably.
- Track references (
tref/chap): audio track points only to the title text track and the image track (when present); the URL track is deliberately not referenced. This matches Apple-authored files and keeps QuickTime showing titles while Music.app shows thumbnails. - HREF propagation: every chapter URL is mirrored into the title track payload as well, which makes AVFoundation expose it in
extraAttributes[HREF]. - Timescales: text/url/image tracks use 1000 Hz; audio timescale is preserved from the source. Track IDs may differ; structure/flags/handlers remain.
- Chapter images: the video track dimensions come from the first JPEG; keep all chapter images the same size (and yuvj420p) so every frame displays in QuickTime/Music. Dimension mismatches emit a mux-time warning and may hide later frames.
trak (titles)
tkhd flags=1, alt_group=1, id=2
hdlr type='text', name='Chapter Titles'
mdia
mdhd timescale=1000
hdlr text
minf/nmhd
stbl
stsd -> tx3g sample entry (see "title and url as tx3g sample")
stts: one entry per sample, sample_count = chapter_count
stsc: 3 entries, 1 sample per chunk
stsz: per-sample sizes (chapter_count)
stco: chunk offsets (chapter_count)
trak (URLs, only if any chapter has `url`)
tkhd flags=1, alt_group=1, id=3
hdlr type='text', name='Chapter URLs'
mdia/mdhd timescale=1000
stbl mirrors titles; samples carry `href` box:
sample = [len][utf8 text (often empty)][href box]
href box: size=0x1a, type='href', start=0, end=0x000a, url_len, url bytes, pad
trak (images)
tkhd flags=7, id=4, width/height set from first JPEG
hdlr type='vide', name='Chapter Images'
mdia/mdhd timescale=1000
stsd jpeg sample entry
stts/stsc/stsz/stco sized to number of images; stss marks every sample as sync
stsdtx3gsample entry matches Apple/golden layout:- displayFlags:
0x00000000 - justification:
0x01FF(horizontal: center, vertical: baseline) - bg color:
0x1f1f1f00(RGBA: dark gray, fully transparent) - default style: start=0, end=0, fontID=1, face=1, size=0x12, color=000000FF (RGBA: black, opaque)
- font table: single entry “Sans-Serif”
- displayFlags:
- Text samples:
[len][utf8 text][href box?]where href box issize=0x1a type=href start=0 end=0x000a len url pad. - URL track (
tx3gwithhref): same sample entry; samples may have empty text,hrefdrives AVFoundation’sextraAttributes[HREF].
- Every sample is sync-marked (
stss), timescale 1000. Use baseline JPEG yuvj420p; 4:4:4 art can blank thumbnails in QuickTime/Music. If you supply art, re-encode with:ffmpeg -y -i your_art.jpg -pix_fmt yuvj420p your_art_420.jpg
cmake -S . -B build
cmake --build buildTargets:
chapterforge— static librarychapterforge_cli— command-line tool
Notes:
- Requires a compiler with C++20 support.
- Fast-start is on by default.
- Tests and docs are optional targets (see below).
- To make AVFoundation surface
extraAttributes[HREF]consistently, ChapterForge mirrors each URL into both the URL track and the title track sample text; players still show the title normally, while AVFoundation exposes the HREF.
Public header: chapterforge.hpp
struct ChapterTextSample {
std::string text; // UTF-8 text
std::string href; // optional hyperlink URL (tx3g modifier)
uint32_t start_ms = 0; // absolute start time in ms
};
struct ChapterImageSample {
std::vector<uint8_t> data; // JPEG bytes for this chapter frame
uint32_t start_ms = 0; // absolute start time in milliseconds
};
// Status + message on failure.
struct Status { bool ok; std::string message; };
// Writes top level metadata (optional), chapters with titles, optional URLs and chapter images.
Status mux_file_to_m4a(const std::string& input_audio_path,
const std::vector<ChapterTextSample>& text_chapters,
const std::vector<ChapterTextSample>& url_chapters,
const std::vector<ChapterImageSample>& image_chapters,
const MetadataSet& metadata,
const std::string& output_path,
bool fast_start = true);Note that there are several overloads for mux_file_to_m4a, for your convenience and clear intent.
// Result from reading and parsing an MP4/M4A file.
struct ReadResult {
Status status;
std::vector<ChapterTextSample> text_chapters;
std::vector<ChapterTextSample> url_chapters;
std::vector<ChapterImageSample> image_chapters;
MetadataSet metadata;
};
// Extracts chapter titles, optional URL samples (tx3g + href), chapter images (MJPEG samples),
// and top-level metadata (ilst if present). Does not decode audio.
ReadResult read_m4a(const std::string &path);If metadata is empty and the source has an ilst, it is reused automatically.
The CLI front-end is effectively:
#include "chapterforge.hpp"
#include <iostream>
int main(int argc, char** argv) {
if (argc != 4) {
std::cerr << "usage: chapterforge <input.m4a|.mp4|.aac> <chapters.json> <output.m4a>\n";
return 2;
}
std::string input = argv[1];
std::string chapters= argv[2];
std::string output = argv[3];
auto status = chapterforge::mux_file_to_m4a(input, chapters, output);
if (!status.ok) {
std::cerr << "chapterforge: failed to write output: " << status.message << "\n";
return 1;
}
std::cout << "Wrote: " << output << "\n";
return 0;
}Use the higher-level overload if you already have chapters/material in memory and don’t want to read JSON on disk.
Reading (extract chapters/metadata/images) mirrors the CLI read mode:
auto res = chapterforge::read_m4a("input.m4a");
if (!res.status.ok) {
std::cerr << "read failed: " << res.status.message << "\n";
} else {
std::cout << "title: " << res.metadata.title << "\n";
for (const auto& c : res.titles) {
std::cout << c.start_ms << " ms -> " << c.text << " href=" << c.href << "\n";
}
}read_m4a returns:
status—ok/messagepair.titles— chapter title samples (tx3g), includinghrefmirrored from the URL track when present.urls— optional URL track samples (tx3g + href). Empty if no URL track exists.images— optional JPEG chapter images.metadata— top-level ilst metadata (reused from source; empty if absent).
Note: When reading, missing fields are left empty rather than synthesized (e.g., a chapter without a URL
will have an empty URL sample and no url/url_text keys in the exported JSON).
Quick run:
cmake -S . -B build -DENABLE_OUTPUT_TOOL_TESTS=ON
cmake --build build
cd build
ctest --output-on-failureOptional toggles (configure-time):
-DENABLE_BIG_IMAGE_TESTS=ON— heavy image/long-duration fixtures (needsinput_big.m4a+ large JPEGs).-DENABLE_STRICT_VALIDATION=ON— extra tool-based checks (mp4info/mp4dump/AtomicParsley/ffprobe/MP4Box).-DENABLE_AVFOUNDATION_SMOKE=ON— macOS Swift smoke test (needsswift).
Tooling deps (used only by tooling-labeled tests):
- Bento4
mp4info/mp4dump(JSON parsing for audio/atom checks) AtomicParsley(atom tree inspection)gpac(MP4Box) andffprobe(strict validation, optional)xxd(hex dumps for atom offset checks)say(macOS only; optional for synthetic audio generation)
Notes:
- Core tests (label
core) require only the compiler/runtime. - Tooling tests are optional and run only when deps are present; skip by leaving
ENABLE_OUTPUT_TOOL_TESTS=OFF. - CI installs these per-platform; local runs can be minimal.
Issues and PRs are welcome. Please:
- Keep changes ASCII unless the file already uses Unicode.
- Run tests before submitting:
cmake -S . -B build -DENABLE_OUTPUT_TOOL_TESTS=ON && cmake --build build && (cd build && ctest --output-on-failure). - Add or update tests when you change muxing behavior, metadata handling, or JSON parsing.
- Keep comments concise and only where the code isn’t self-explanatory.
This is anything but a reference implementation. Many shortcuts were taken to reach the goal. There are plenty of hardcoded, magic bytes in this project and the parsers may explode with the next file you provide to them. If you need enterprise grade, this is not the library for you. If you need something similar but not exactly what this does, you better are a developer ready to contribute when contacting me as I have no interest in working for you.


{ "title": "Sample Podcast Episode", // optional top-level metadata "artist": "John Doe", "album": "My Podcast", "genre": "Podcast", "year": "2024", "comment": "Created with ChapterForge", "cover": "cover.jpg", // optional; path is relative to the JSON file "chapters": [ { "title": "Introduction", // required "start_ms": 0, // required: chapter start time in milliseconds (first snaps to 0) "image": "chapter1.jpg", // optional; path relative to the JSON file "url": "https://example.com", // optional; creates a URL text track with HREF "url_text": "Intro link label" // optional; text payload for the URL track (defaults empty) }, { "title": "Main Discussion", "start_ms": 10000, "image": "chapter2.jpg", "url": "" } ] }