A concise technical reference for the RTMP protocol as implemented by this server. For the full specification, see Adobe's RTMP Specification.
RTMP (Real-Time Messaging Protocol) is a TCP-based protocol for streaming audio, video, and data. A typical session has four phases:
1. TCP Connect (port 1935)
2. Handshake (version exchange + random data echo)
3. Command Phase (negotiate application, create streams, start publishing/playing)
4. Media Phase (continuous audio/video frame transmission)
The handshake verifies both sides speak RTMP v3. Each party sends three pieces of data:
| Packet | Size | Contents |
|---|---|---|
| C0/S0 | 1 byte | Version (must be 0x03) |
| C1/S1 | 1536 bytes | 4-byte timestamp + 4 zero bytes + 1528 random bytes |
| C2/S2 | 1536 bytes | Echo of the peer's C1/S1 (verifies connectivity) |
Sequence:
Client Server
────── ──────
C0+C1 ──────────►
◄────────── S0+S1+S2
C2 ──────────►
After the handshake, both sides switch to chunk-based communication.
RTMP does not send complete messages over TCP. Instead, each message is split into chunks with a maximum payload size (default 128 bytes, negotiated up to 4096+).
Large video keyframes (50+ KB) would block the connection for audio data. By interleaving small chunks from different streams, RTMP ensures low-latency audio/video multiplexing.
┌──────────────┬──────────────────┬─────────────────────┬──────────┐
│ Basic Header │ Message Header │ Extended Timestamp? │ Payload │
│ (1-3 bytes) │ (0/3/7/11 bytes) │ (0 or 4 bytes) │ (≤chunk │
│ │ │ │ size) │
└──────────────┴──────────────────┴─────────────────────┴──────────┘
The first byte encodes two values:
- Bits 7-6: FMT (header format type, 0-3)
- Bits 5-0: Chunk Stream ID (CSID)
CSID encoding:
- Values 2-63: 1-byte form (CSID in bits 5-0)
- Value 0 in bits 5-0: 2-byte form (next byte + 64)
- Value 1 in bits 5-0: 3-byte form (next 2 bytes + 64)
FMT controls how much header information is present. Higher FMT values omit unchanged fields:
| FMT | Header Size | Fields Present | When Used |
|---|---|---|---|
| 0 | 11 bytes | Timestamp (abs), Length, TypeID, StreamID | First message on CSID |
| 1 | 7 bytes | Timestamp (delta), Length, TypeID | Same stream, different size/type |
| 2 | 3 bytes | Timestamp (delta) | Same stream, same size/type |
| 3 | 0 bytes | (none — all inherited) | Continuation chunks |
When the 3-byte timestamp field equals 0xFFFFFF (16,777,215), an additional 4-byte timestamp follows the message header. This supports timestamps beyond ~4.66 hours.
The 4-byte Message Stream ID in FMT 0 headers is encoded in little-endian — the only little-endian field in RTMP. All other multi-byte integers are big-endian.
| TypeID | Name | Purpose |
|---|---|---|
| 1 | Set Chunk Size | Change maximum chunk payload size |
| 2 | Abort Message | Discard a partially received message |
| 3 | Acknowledgement | Report bytes received (flow control) |
| 4 | User Control | Stream lifecycle events (Begin, Ping) |
| 5 | Window Ack Size | Set acknowledgement window |
| 6 | Set Peer Bandwidth | Limit output rate |
| 8 | Audio | Audio data (AAC, MP3, Speex; Opus, FLAC via Enhanced RTMP) |
| 9 | Video | Video data (H.264, H.265; AV1, VP9 via Enhanced RTMP) |
| 20 | Command (AMF0) | Application commands (connect, publish, play) |
After the handshake, the server sends three control messages:
- Window Acknowledgement Size (2,500,000 bytes) — flow control
- Set Peer Bandwidth (2,500,000 bytes, Dynamic) — output rate hint
- Set Chunk Size (4096 bytes) — increase from default 128
Commands are serialized in AMF0 (Action Message Format version 0):
| Marker | Type | Example |
|---|---|---|
0x00 |
Number | 42.0 (IEEE 754 double) |
0x01 |
Boolean | true / false (1 byte) |
0x02 |
String | "live" (2-byte length + UTF-8) |
0x03 |
Object | {"app":"live"} (key-value pairs, ends with 0x00 0x00 0x09) |
0x05 |
Null | (no payload) |
0x0A |
Array | [1, "x", true] (4-byte count + elements) |
Client → Server: ["connect", 1.0, {"app":"live", "tcUrl":"rtmp://host/live", ...}]
Server → Client: ["_result", 1.0, {fmsVer, capabilities}, {code:"NetConnection.Connect.Success"}]
Client → Server: ["createStream", 2.0, null]
Server → Client: ["_result", 2.0, null, 1.0] // stream ID = 1
Server → Client: UserControl StreamBegin(1)
Client → Server: ["publish", 0, null, "mystream", "live"]
Server → Client: ["onStatus", 0, null, {code:"NetStream.Publish.Start"}]
After this, the client sends audio (TypeID 8) and video (TypeID 9) messages.
Client → Server: ["play", 0, null, "mystream", -2] // -2 = live
Server → Client: UserControl StreamBegin(1)
Server → Client: ["onStatus", 0, null, {code:"NetStream.Play.Start"}]
Server → Client: (cached audio sequence header, if available)
Server → Client: (cached video sequence header, if available)
After this, the server forwards media messages from the publisher.
The first byte of an audio message payload:
Bits 7-4: SoundFormat (codec) Bits 3-2: SampleRate
Bit 1: SampleSize Bit 0: Channels
Key codec IDs: 2=MP3, 10=AAC, 11=Speex
For AAC, byte 2 distinguishes:
0x00= Sequence Header (AudioSpecificConfig — decoder initialization)0x01= Raw AAC frame data
The first byte of a video message payload:
Bits 7-4: FrameType Bits 3-0: CodecID
Key values: FrameType 1=Keyframe, 2=Inter-frame. CodecID 7=H.264 (AVC), 12=H.265 (HEVC).
For H.264, byte 2 distinguishes:
0x00= Sequence Header (SPS/PPS — decoder initialization)0x01= NALU (actual video data)
Enhanced RTMP extends the legacy audio/video tag format to support modern codecs using FourCC-based signaling, while remaining backward compatible with legacy H.264/AAC streams.
The IsExHeader bit (bit 7) of the first video tag byte signals an enhanced packet:
Legacy: [0FFFC CCC] → bits[7:4]=FrameType, bits[3:0]=CodecID
Enhanced: [1FFF PPPP] → bit 7=IsExHeader, bits[6:4]=FrameType, bits[3:0]=VideoPacketType
followed by 4-byte FourCC (codec identifier)
When IsExHeader is set, the next 4 bytes contain a FourCC code identifying the codec:
| FourCC | Codec | Description |
|---|---|---|
hvc1 |
H.265/HEVC | High Efficiency Video Coding |
av01 |
AV1 | AOMedia Video 1 |
vp09 |
VP9 | Google VP9 |
vp08 |
VP8 | Google VP8 |
avc1 |
H.264/AVC | Advanced Video Coding (enhanced path) |
vvc1 |
H.266/VVC | Versatile Video Coding |
When SoundFormat (bits 7-4 of first audio byte) equals 9, the audio message uses the enhanced format:
Enhanced audio: bits[3:0]=AudioPacketType, followed by 4-byte FourCC
| FourCC | Codec | Description |
|---|---|---|
Opus |
Opus | Low-latency audio codec |
fLaC |
FLAC | Free Lossless Audio Codec |
ac-3 |
AC-3 | Dolby Digital |
ec-3 |
E-AC-3 | Dolby Digital Plus |
.mp3 |
MP3 | MPEG-1 Audio Layer 3 (enhanced path) |
Note: FourCC values are case-sensitive (e.g., fLaC not flac).
Clients advertise Enhanced RTMP support by including a fourCcList array in the connect command's command object:
Client → Server: ["connect", 1.0, {..., "fourCcList":["hvc1","av01","vp09"]}]
Server → Client: ["_result", 1.0, {...}, {..., "fourCcList":["hvc1","av01","vp09"]}]
The server echoes the supported FourCC codes in its _result response.
Enhanced RTMP is fully backward compatible:
- Legacy H.264/AAC streams (IsExHeader=0) continue to work unchanged
- The server auto-detects enhanced packets — no configuration needed
- Compatible with FFmpeg 6.1+, OBS 29.1+, and SRS 6.0+
VideoPacketType 7 and AudioPacketType 7 signal a ModEx wrapper. ModEx adds modifier extensions to another packet, enabling features like sub-millisecond timestamp precision:
[ModExType:4bits][DataSize:4bits][ModExData:1-4 bytes][WrappedPacket...]
| ModExType | Name | Description |
|---|---|---|
| 0 | TimestampOffsetNano | Nanosecond offset (0–999999) added to the base RTMP millisecond timestamp |
DataSize encoding: 0=1 byte, 1=2 bytes, 2=3 bytes, 3=4 bytes. Values 4+ are reserved.
Use ParseModEx() on the payload to extract the modifier data and wrapped packet.
VideoPacketType 6 and AudioPacketType 6 signal multitrack content — multiple audio or video tracks in a single RTMP stream:
[AvMultitrackType:4bits][InnerPacketType:4bits][TrackData...]
| AvMultitrackType | Name | Description |
|---|---|---|
| 0 | OneTrack | Single track with explicit track ID |
| 1 | ManyTracks | Multiple tracks, same codec |
| 2 | ManyTracksManyCodecs | Multiple tracks, different codecs per track |
Use ParseMultitrack() on the payload to extract individual tracks.
| Type | Value | Name | Description |
|---|---|---|---|
| Video | 5 | MPEG2TSSequenceStart | MPEG-2 TS sequence start (recognized, passed through) |
| Audio | 4 | SequenceEnd | Signals end of audio stream |
| Audio | 5 | MultichannelConfig | Multichannel audio layout configuration |
When a ModEx packet carries a TimestampOffsetNano modifier, the full nanosecond timestamp is:
nanoseconds = (rtmpTimestamp × 1,000,000) + nanosecondOffset
This allows sub-millisecond A/V synchronization (important for lip-sync and multi-camera setups). The nanosecond offset is automatically extracted during parsing and stored in VideoMessage.NanosecondOffset / AudioMessage.NanosecondOffset.
AudioPacketType 5 carries multichannel layout configuration:
[AudioChannelOrder:4bits][AudioChannelCount:4bits][ChannelMapping...]
- ChannelOrder 0 (Unspecified): codec-native order, no explicit mapping
- ChannelOrder 1 (Native): standard layout for the channel count (e.g., AAC ISO 14496-3)
- ChannelOrder 2 (Custom): explicit per-channel speaker mapping follows (one byte per channel)
Use ParseMultichannelConfig() on the payload to extract channel layout details.
The server can request clients to gracefully disconnect and reconnect by sending an onStatus command with the status code NetConnection.Connect.ReconnectRequest. This is useful for server maintenance, load balancing, and graceful shutdown.
Server → Client: onStatus(0, null, {
level: "status",
code: "NetConnection.Connect.ReconnectRequest",
description: "Server maintenance",
tcUrl: "rtmp://new-server/live" // optional redirect
})
- Transaction ID: Always 0 (no response expected from the client)
- tcUrl: Optional. When present, the client should reconnect to this URL instead of the original. When absent, the client reconnects to the same server.
- description: Human-readable reason for the reconnect request
Clients supporting E-RTMP v2 will disconnect and reconnect to the specified URL (or the original URL if no tcUrl is provided). The server exposes this via:
Server.RequestReconnect(connID, tcUrl, description)— target a single connectionServer.RequestReconnectAll(tcUrl, description)— broadcast to all connectionsSIGUSR1signal — triggersRequestReconnectAllwith the optional-reconnect-urlflag
The first audio and video messages from a publisher are typically sequence headers — they contain codec configuration data that decoders need before processing any media frames:
- H.264 Video Sequence Header: Contains SPS (Sequence Parameter Set) and PPS (Picture Parameter Set) — resolution, profile, frame rate parameters
- AAC Audio Sequence Header: Contains AudioSpecificConfig — sample rate, channel count, codec profile
- Enhanced RTMP Sequence Headers: H.265 (HEVCDecoderConfigurationRecord), AV1 (AV1CodecConfigurationRecord), VP9, Opus, and FLAC each carry their own codec-specific configuration via the enhanced tag format
- AC-3/E-AC-3 Sequence Headers: Dolby Digital and Dolby Digital Plus carry their AudioSpecificConfig (dac3/dec3) via Enhanced RTMP tags with FourCC
ac-3andec-3respectively. These are generated by the SRT bridge when AC-3/E-AC-3 audio is detected in the MPEG-TS stream.
The server caches these so late-joining subscribers can immediately initialize their decoders. Caching works for all codecs — both legacy and Enhanced RTMP.
RTMP identifies streams using an application name + stream name:
URL: rtmp://host:1935/live/mystream
└── host ──┘ └app┘ └stream┘
Stream Key: "live/mystream"
The application name is sent in the connect command. The stream name is sent in publish or play.
By default, the server writes one recording file per publish session. Segmented recording splits the recording into multiple files of a configurable duration, with each segment starting at a video keyframe so it can be played back independently.
This is useful for:
- HLS-like workflows — short segments (2–10 s) ready for packaging
- Archival — manageable 15-minute chunks instead of multi-hour monoliths
- Fault tolerance — if the server crashes, only the current segment is lost
| Flag | Description | Default |
|---|---|---|
-segment-duration |
Target duration per segment. Actual splits happen at the next keyframe after this duration elapses. Examples: 2s, 30s, 5m, 15m. |
"" (disabled — single file per session) |
-segment-pattern |
Filename pattern for segments. Supports placeholders listed below. | %s_%T_seg%03d |
| Placeholder | Expands To | Example |
|---|---|---|
%s |
Stream key (slashes replaced with underscores) | live_mystream |
%d |
Segment number (supports printf-style padding: %03d, %04d) |
001, 0001 |
%T |
Full timestamp (YYYYMMDD_HHMMSS) |
20260419_130000 |
%Y |
Year (4 digits) | 2026 |
%m |
Month (2 digits) | 04 |
%D |
Day (2 digits) | 19 |
%H |
Hour (2 digits, 24-hour) | 13 |
%M |
Minute (2 digits) | 00 |
%S |
Second (2 digits) | 00 |
%% |
Literal % |
% |
Segments do not split at the exact requested duration. Instead, the server waits for the next video keyframe after the target duration has elapsed. This guarantees:
- Every segment starts with a keyframe — no decoder initialization issues.
- Each segment is independently playable without referencing other segments.
- The actual segment duration may be slightly longer than the configured value (by up to one keyframe interval, typically 1–2 seconds).
# Basic: 30-second segments with default naming
./rtmp-server -record-all true -segment-duration 30s
# → recordings/live_mystream_20260419_130000_seg001.mp4
# → recordings/live_mystream_20260419_130030_seg002.mp4
# Short segments (2 seconds, for HLS-like workflows)
./rtmp-server -record-all true -segment-duration 2s
# Long segments (15 minutes, for archival)
./rtmp-server -record-all true -segment-duration 15m
# Custom pattern with 4-digit padding
./rtmp-server -record-all true -segment-duration 5m -segment-pattern "%s_%T_seg%04d"
# → recordings/live_mystream_20260419_130000_seg0001.mp4
# Date-based subdirectories
./rtmp-server -record-all true -segment-duration 10m -segment-pattern "%Y/%m/%D/%s_seg%03d"
# → recordings/2026/04/19/live_mystream_seg001.mp4
# Simple numbered segments
./rtmp-server -record-all true -segment-duration 1m -segment-pattern "stream_%d"
# → recordings/stream_1.mp4, recordings/stream_2.mp4, ...Segmented recording works identically across RTMP, RTMPS, and SRT ingest — the segment logic runs after media messages have been normalized into the internal format.