Fix desync detection false positive caused by a rollback #86

caspark · 2024-12-06T08:15:38Z

This fixes the case where the following happens:

Sync layer requests roll back to frame N
Sync layer requests advance to frame N+1
Sync layer marks frame N as confirmed
P2pSession's desync detection decides to send a checksum for frame N to peers, even though the checksum for frame N is now known to be incorrect (because ggrs-consuming-code has not had a chance to act on the advance frame request, so the checksum stored in frame N is from before the rollback).
Peers receive the incorrect checksum and flip out thinking there is a desync.

Actually causing this bug to reliably manifest in practice is somewhat tricky; I was able to reliably do it in my game via the following:

high latencies (150ms on each client network interface, so 300ms round trip)
input delay == 0
desync detection interval set to 1
start all game clients at once
in the first advance, each client initially sends an input which is mispredicted by all other clients (so not the zeroed input).
and lastly that first input from each client must modify the resulting checksum in a way that is unique from each other client's input.

The concrete input I am sending in my game is approximately equivalent to "store the client handle in the game state", but when I tried modifying the example game to do similar then it didn't manifest this bug - no idea why. In any case, this patch fixes the issue so I'll leave it at that.

It also changes a < to <= in a seemingly-isolated way; this change is based on the rationale given at
https://github.com/gschup/ggrs/pull/64/files#r1368265924 no longer applying.

cc @johanhelsing since at one point you were also somewhat familiar with this code thanks to #64 - although that was quite a while ago :)

Alternative approaches and why I didn't take them:

"we can just skip sending a checksum if there's a rollback queued"
- this is easy to detect because the first incorrect frame is easily accessible, but clients rely on receiving all checksums eventually so this isn't a good approach.
"when we would send a checksum but there's already a rollback queued, instead remember that we need to send a checksum the next time advance_frame() is called"
- this is functionally the same as what this patch does, except it would require some bookkeeping to keep track of "hey send a checksum next frame". Just moving the desync detection logic to the start of advance_frame() is neater overall.

caspark · 2024-12-06T08:19:41Z

Oh, I forgot to mention in the commit notes: I needed >2 clients to trigger this reliably - even though in theory it ought to be possible with 2 clients.

gschup · 2024-12-06T08:28:16Z

This stuff seems like an endless source of tiny bugs. Good job on catching this one! Now let's see if I can wrap my head around it, too :)

caspark · 2024-12-06T12:15:06Z

If I've learned anything in the last 3-4 weeks, it's that "endless source of tiny bugs" perfectly describes working on game netcode in general :)

gschup

Looks good, it would be great if you could fix the merge conflicts :)

This fixes the case where the following happens: 1. Sync layer requests roll back to frame N 2. Sync layer requests advance to frame N+1 3. Sync layer marks frame N as confirmed 4. P2pSession's desync detection decides to send a checksum for frame N to peers, even though the checksum for frame N is now known to be incorrect, (because ggrs-consuming-code has not had a chance to act on the advance frame request, so the checksum stored in frame N is from before the rollback.) 5. Peers receive the incorrect checksum and flip out thinking there is a desync. Actually causing this bug to reliably manifest in practice is somewhat tricky; I was able to reliably do it in my game via the following: * high input latencies (150ms on each client network interface, so 300ms round trip) * input delay == 0 * desync detection interval set to 1 * start all game clients at once * in the first advance, each client initially sends an input which is mispredicted by all other clients (so not the zeroed input). * and lastly that first frame from each client must modify the resulting checksum in a way that is unique from each other client. The concrete input I am sending in my game is approximately equivalent to "store the client handle in the game state", but but when I tried modifying the example game to do similar then it didn't manifest this bug - no idea why. In any case, this patch fixes the issue so I'll leave it at that. It also changes a < to <= in a seemingly-isolated way; this change is based on the rationale given at https://github.com/gschup/ggrs/pull/64/files#r1368265924 no longer applying.

caspark · 2024-12-12T10:57:00Z

Merge conflicts fixed!

gschup approved these changes Dec 12, 2024

View reviewed changes

caspark force-pushed the fix-rollback-desync-false-positive branch from 521b12c to 2325935 Compare December 12, 2024 10:56

gschup merged commit b66d18f into gschup:main Dec 12, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix desync detection false positive caused by a rollback #86

Fix desync detection false positive caused by a rollback #86

Uh oh!

caspark commented Dec 6, 2024 •

edited

Loading

Uh oh!

caspark commented Dec 6, 2024

Uh oh!

gschup commented Dec 6, 2024

Uh oh!

caspark commented Dec 6, 2024

Uh oh!

gschup left a comment •

edited

Loading

Uh oh!

caspark commented Dec 12, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix desync detection false positive caused by a rollback #86

Fix desync detection false positive caused by a rollback #86

Uh oh!

Conversation

caspark commented Dec 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

caspark commented Dec 6, 2024

Uh oh!

gschup commented Dec 6, 2024

Uh oh!

caspark commented Dec 6, 2024

Uh oh!

gschup left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

caspark commented Dec 12, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

caspark commented Dec 6, 2024 •

edited

Loading

gschup left a comment •

edited

Loading