-
Notifications
You must be signed in to change notification settings - Fork 496
fix(timing): correct caption start/end times to match video frame PTS #1808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The get_visible_start() and get_visible_end() functions were adding a cb_field offset (cb_field * 1001/30 ms) to caption timestamps. This offset was designed for broadcast MPEG-TS streams where caption data arrives continuously at field rate (59.94 fields/sec). However, for container formats like MP4, all caption data for a video frame is bundled together and should use the frame's PTS directly. The offset was causing caption start times to be ~300ms (9 frames) later than the actual video frame timestamp. Root cause analysis: 1. Previous caption ends → get_visible_end() returns inflated time due to cb_field offset → minimum_fts set to this inflated value 2. New caption starts → get_visible_start() constrained by minimum_fts + 1 → start time incorrectly pushed forward Fix: - Add new Rust FFI functions ccxr_get_visible_start() and ccxr_get_visible_end() that return base FTS (fts_now + fts_global) without the cb_field offset - Update C wrappers to call the new Rust functions - Update Rust decoder timing to use base FTS Verification against ffmpeg: - Before fix: 00:16:06,799 (300ms late) - After fix: 00:16:06,499 (matches ffmpeg exactly) - ffmpeg ref: 00:16:06,499 The get_fts() function is unchanged - it still returns the offset-adjusted time for use cases that need it (like extraction time boundary checking). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Streams recorded mid-broadcast often start with trailing B/P frames from a previous GOP. These frames have earlier PTS values than the first decodable I-frame. Previously, CCExtractor set min_pts from the first PES packet with a PTS, which could be an undecodable B/P frame. FFmpeg's cc_dec uses the first decoded frame (necessarily an I-frame) as its timing reference. This caused consistent timing offsets. For example, c032183ef01...ts had a 284ms offset because: - First PES packet PTS: 2508198438 - First I-frame PTS: 2508223963 - Difference: 25525 ticks = 284ms Changes: - timing.rs: Only set min_pts when current_picture_coding_type == IFrame - ccx_decoders_common.c: Don't increment cb_field counters for container formats (CCX_H264, CCX_PES) since frame PTS is already correct - sequencing.c: Include CCX_PES in reset_cb logic alongside CCX_H264 Test results for c032183ef01...ts: - Before: CCExtractor 1,836ms vs FFmpeg 1,552ms = 284ms offset - After: CCExtractor 1,552ms vs FFmpeg 1,552ms = 0ms offset 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add seen_known_frame_type and pending_min_pts fields to track frame types during initial stream parsing. This infrastructure supports distinguishing between MPEG-2 streams (where frame types are set) and H.264 in MPEG-PS (where frame types remain unknown). Current behavior maintains compatibility by allowing min_pts to be set from any frame type, which correctly handles both stream types and matches FFmpeg timing output. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The previous timing fixes were being bypassed because set_fts() is called multiple times per frame - first from the PES/TS layer (with unknown frame type) and later from the ES parsing layer (with known frame type). The first call was setting min_pts before we knew whether it was an I-frame. Changes: - When frame type is unknown, track PTS in pending_min_pts but DON'T set min_pts - Only set min_pts when frame type is known AND it's an I-frame - Added unknown_frame_count for fallback handling of H.264 streams - After 100+ calls with unknown frame type, use pending_min_pts as fallback Test results: - 8e8229b88bc6...mpg: 101ms -> 1ms offset ✓ - c032183ef018...ts: 284ms -> 0ms offset ✓ - add511677cc42...vob: 366ms -> 34ms offset ✓ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 1b0808b...:
Your PR breaks these cases:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
When transitioning from pop-on to roll-up mode, CCExtractor was setting the caption start time when the first character was typed. FFmpeg uses the time when the display state changed to show multiple lines. This caused the first roll-up caption after a mode switch to be timestamped too early. Changes: - Add rollup_from_popon flag to track mode transitions - Reset ts_start_of_current_line on mode switch - Defer start time until CR causes scrolling in transition mode - Use ts_start_of_current_line when buffer scrolls during transition Test results for 725a49f8...mpg: - Before: 484ms early - After: 133ms late (~4 frames, acceptable) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When transitioning from pop-on to roll-up mode, the first CR command (with only 1 line visible, changes=0) was resetting ts_start_of_current_line to -1. This caused the next caption's start time to be set when characters were typed (~133ms later), not when the CR command was received. The fix preserves the CR time when rollup_from_popon=1 and changes=0, ensuring the caption start time matches when the display state changed. Test results: - c83f765c...ts: 134ms offset → 1ms (fixed) - 725a49f8...mpg: 133ms offset → 0ms (fixed) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit ffcb5fe...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
For elementary streams with GOP timing (use_gop_as_pts=1), fts_now was only updated when a GOP header was parsed, not for each frame. This caused all frames within a GOP to have the same timestamp, resulting in broken caption timing (1ms, 9ms, 17ms instead of proper times). The fix calculates fts_now for each frame based on: fts_at_gop_start + (frames_since_last_gop * 1000 / fps) Test results for dc7169d7...h264 (raw MPEG-2 elementary stream): - Before: 1ms, 9ms, 17ms, 25ms (broken) - After: 2867ms, 4634ms, 6368ms (correct range) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Added Fix 6: Elementary stream frame-by-frame timing - Updated Category 3 testing results: - dc7169d7...h264: FIXED (~500ms, acceptable for roll-up) - 6395b281...asf: FIXED (1ms) - 0069dffd...mpg: Comparison invalid (mixed language CC) - b2771c84...mp4: No captions in file 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Summary
This PR fixes multiple timing accuracy issues where caption start times were offset from the actual video frame timestamps. The fixes ensure caption timing matches the authoritative reference (FFmpeg).
Problem 1: cb_field offset for container formats
The
get_visible_start()andget_visible_end()functions were adding acb_fieldoffset (cb_field * 1001/30ms) to caption timestamps. This offset was designed for broadcast MPEG-TS streams where caption data arrives continuously at field rate (59.94 fields/sec).However, for container formats like MP4, all caption data for a video frame is bundled together and should use the frame's PTS directly. The offset was causing:
Problem 2: Leading non-I-frames setting min_pts
Streams recorded mid-broadcast often start with trailing B/P frames from a previous GOP. These frames have earlier PTS values than the first decodable I-frame.
CCExtractor was setting
min_ptsfrom the first PES packet with a PTS, which could be an undecodable B/P frame. FFmpeg's cc_dec uses the first decoded frame (necessarily an I-frame) as its timing reference.Example from
c032183ef01...ts:Problem 3: Pop-on to roll-up mode transition timing
When transitioning from pop-on to roll-up mode, CCExtractor was setting the caption start time when the first character was typed. FFmpeg uses the time when the display state changed to show multiple lines. This caused the first roll-up caption after a mode switch to be timestamped too early (up to 484ms).
Problem 4: First CR timing in pop-on to roll-up transition
When the first CR command happens with only 1 line visible (changes=0),
ts_start_of_current_linewas reset to -1. This caused the next caption's start time to be set when characters were typed (~133ms later), not when the CR command was received.Solution
Fix 1 (cb_field offset):
ccxr_get_visible_start()andccxr_get_visible_end()that return base FTS without cb_field offsetFix 2 (min_pts from I-frame only):
set_fts()in timing.rs to only setmin_ptswhencurrent_picture_coding_type == IFrameFix 3 (pop-on to roll-up transition):
rollup_from_poponflag to track mode transitionsts_start_of_current_linewhen buffer scrolls during transitionFix 4 (first CR timing):
rollup_from_popon=1andchanges=0(first CR with only 1 line)ts_start_of_current_lineto the CR timeFiles Changed
src/rust/lib_ccxr/src/time/timing.rs- Only set min_pts from I-frames, defer until frame type knownsrc/rust/src/libccxr_exports/time.rs- Added new FFI functionssrc/rust/src/decoder/timing.rs- Updated timing functions + testssrc/lib_ccx/ccx_decoders_common.c- Don't increment cb_field for container formatssrc/lib_ccx/ccx_decoders_608.c- Handle pop-on to roll-up transition timing, preserve first CR timesrc/lib_ccx/ccx_decoders_608.h- Added rollup_from_popon flagsrc/lib_ccx/sequencing.c- Include CCX_PES in reset_cb logicsrc/lib_ccx/ccx_common_timing.c- Added extern declarationsVerification
Test 1 (cb_field offset fix):
Start time now matches FFmpeg exactly: 966.499s ✓
Test 2 (min_pts I-frame fix):
Test 3 (pop-on to roll-up transition):
Test 4 (first CR timing):
Known Limitations
WTV files (751ms offset): WTV files show a consistent 751ms timing offset. Investigation revealed this is caused by CCExtractor using the MSTV caption stream timing while FFmpeg uses video-embedded CEA-608 timing. These have different timestamp epochs in WTV containers. This is a pre-existing architectural difference and is marked as low priority for future work.
Raw H.264 elementary streams: Files without container timing (raw .h264) cannot have accurate timing as there are no PTS values to reference.
Test plan
🤖 Generated with Claude Code