Skip to content

Conversation

@morgant
Copy link
Contributor

@morgant morgant commented Oct 14, 2025

NOTE: PR #12 should be merged prior to this PR.

I'll be doing some more live stream stress testing of this PR, but this appears to fully resolve audio/video sync issues for me and results in a far more consistent CBR for my stream (without Twitch's frequent warnings about instabilities).

The short explanation: This uses -thread_queue_size much more conservatively and only on audio/video input on the encode side of the pipe. It also introduces the use of the aresample=async=1 filter, both on the the merge of microphone & monitor audio for mix down to stereo and on the encode side of the pipe so that video and audio (no longer encoded to AAC on the audio merge side of the pipe) are fully synchronized before writing to output file/stream.

For the detailed explanation, see this comment on issue #5.

@morgant
Copy link
Contributor Author

morgant commented Oct 14, 2025

Did a 2 hour stream earlier that did not have any audio desync by the end.

@morgant
Copy link
Contributor Author

morgant commented Oct 15, 2025

5 hour stream today and -- again -- no audio desync by the end. That settles it for me, so I won't report further successes here, only any issues I encounter.

@rfht
Copy link
Owner

rfht commented Nov 14, 2025

I think this diff includes the changes from #12. Could you rebase it now that #12 has been merged?

@rfht
Copy link
Owner

rfht commented Nov 14, 2025

PS: also for next PRs, would appreciate using tabs, not spaces for indentation. The merge threw off my whitespacing... :-)

@rfht
Copy link
Owner

rfht commented Nov 14, 2025

Why is thread_queue_size needed for video at 64? Is this empirical, or is there some source for this decision?

@rfht
Copy link
Owner

rfht commented Nov 14, 2025

Ah, I missed the referenced issue comment with explanation #5 (comment)

@morgant
Copy link
Contributor Author

morgant commented Nov 14, 2025

Thanks for merging #12, @rfht!

My apologies for the accidental change from tabs to spaces! Now that #12 has been merged, I'll rebase this and fix any lingering spacing issues.

Update: Regarding -thread_queue_size, I found it completely unnecessary for audio, especially since async will add/drop packets to keep the audio in sync.

For video, if the -thread_queue_size default is, in fact, 8 and which I believe is a measurement of numbers of packets, then our previous 512 was far too high. My understanding is that a packet in ffmpeg can contain <1..>1 frames. If we consider a near-best-case scenario of 1 frame per packet and encoding at 60fps (~16ms per frame), -thread_queue_size 8 would give us about 128ms of video data buffer. I estimated that giving us 8x the default buffer, which happens to be 64 packets or ~1 second at 60fps, would be more than sufficient for a low-latency 'live' encoding.

Personally, I was testing at 25fps encoding on very low-end hardware and haven't run into any significant frame drops in my live streams.

…nc as my understanding is that will let ffmpeg drop packets it couldn't process fast enough instead of forcing them to be queued and processed in order, regardless of timing. Issue rfht#5
…512', plus only on the video & audio input when encoding, not on the raw audio inputs being merged). Also introduces the use of the filter on the individual audio streams being merged/mixed-down _and_ when merging the video and audio. The latter required moving the audio encoding to the ffmpeg process which is encoding the video. By using the 'async=1' option, the 'aresample' filter will fill and/or trim audio to keep it synced, but will not perform the more advanced stretch and/or sqeezing (I think this makes sense for our uses and _should_ be more efficient). Issue rfht#5
…; if line is too long, break with additional indentation) and tabs for indentation
@morgant morgant force-pushed the potential-desync-fixes branch from 709e1a9 to ef19a7c Compare November 14, 2025 16:59
@morgant
Copy link
Contributor Author

morgant commented Nov 14, 2025

@rfht I've rebased, fixed line break formatting and whitespace.

I haven't had a chance to test yet, so please test before merging. I'll comment once I've tested.

@rfht rfht changed the title Potential desync fixes by using aresample=sync=1 filter Potential desync fixes by using aresample=async=1 filter Nov 16, 2025
@rfht
Copy link
Owner

rfht commented Nov 16, 2025

should there also be a method for the video, like for example -vsync cfr, see https://stackoverflow.com/questions/54700933/ffmpeg-dts-delta-threshold-and-aresample-async-1 (first comment) ?

(Edit: better fps_mode per https://ffmpeg.org/ffmpeg-all.html)

@morgant
Copy link
Contributor Author

morgant commented Nov 21, 2025

should there also be a method for the video, like for example -vsync cfr, see https://stackoverflow.com/questions/54700933/ffmpeg-dts-delta-threshold-and-aresample-async-1 (first comment) ?

(Edit: better fps_mode per https://ffmpeg.org/ffmpeg-all.html)

Good question and find!

I'll investigate, but I'm wondering if maybe we should consider that in a separate issue. My initial thought-process was (and still is): PR #7 updated our x11grab use to only capture at the specified fps and so may already be double-buffering or duplicating/dropping frames (I'm certainly not sure how that interacts with vsync, XDamage, hardware acceleration, etc.) while feeding the muxer. In general, audio issues tend to be more noticeable to viewers than slight framerate issues, especially in live streams where network performance/connectivity issues are significantly more likely to affect video frames anyway.

Regardless, I'll be doing more review & testing this weekend and will post findings & thoughts.

@morgant
Copy link
Contributor Author

morgant commented Nov 22, 2025

Just did some testing and found that monitor-only mode isn't working for me. However, if I use -m (even without my mic recording), that records correctly.

I'll do some more troubleshooting.

@morgant
Copy link
Contributor Author

morgant commented Nov 23, 2025

Just did some testing and found that monitor-only mode isn't working for me. However, if I use -m (even without my mic recording), that records correctly.

Okay, monitor mix configuration issue on my end. I had it only configured on my USB mic (which I use headphones through), so I had to switch the sndio server to that device for the monitor mix to record.

So, this is actually working okay for me. I'll still investigate whether we should be using -vsync/-fps_mod, but I am thinking that it would be best to merge this PR and add a separate issue for that.

What are your thoughts, @rfht? (As always, no urgency.)

@morgant morgant mentioned this pull request Nov 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants