Skip to content

Conversation

@jakmeier
Copy link
Contributor

Ensure in tests that we are able to generate triples and presignatures, even when one node is offline.

This also adds a fixture input for 7 nodes and threshold 4.

@jakmeier
Copy link
Contributor Author

@ChaoticTempest Can you please have a look at this PR?

I've been working on these tests for a while, which should show that we can handle at least one node going offline without the entire system collapsing.

Somehow, it keeps on failing even with just one node not responding. I think we never get any posit rounds through and therefore never start any generators.

I'm also building up better debugging tools in parallel. Those show me that while running test_triple_with_offline_node, there are 0 active TripleGenerator. Adding evidence to the theory that we are stuck in posits.

At first, testing with 3 nodes and threshold 2, I thought it's because nodes don't count themselves as "accept", so enough_accepts needs one more than the defined threshold.

But now I've generated fixture inputs for 7 nodes with threshold 4 and it still fails to generate triples or presignatures.

Do you have an idea what could be going wrong?

@jakmeier
Copy link
Contributor Author

Oh I guess it's because we always for all participants to accept or reject. But an offline node wouldn't do either.

(Was that changed lately? I think when I started working on it, we started as soon as we had enough accepts.)

Anyway, @ChaoticTempest , do you think we can change this behaviour? Requiring all nodes to always respond makes the system rather frail.

@ChaoticTempest
Copy link
Contributor

there's a manual call we need to make to either expire or start if enough has been met after a timeout expire_and_start. So if the fixtures aren't running the run loop inside the spawner, we should add this as well

@jakmeier jakmeier force-pushed the test-offline-nodes branch from 4e585b6 to 8fdb34b Compare January 8, 2026 19:38
@jakmeier
Copy link
Contributor Author

jakmeier commented Jan 8, 2026

Quick update: I've rebased this and the test is still failing. I will see if I can get it to work next week.

Curious: While updating to the format of triples, it generated 2 conflicting in triples. Two instances, where the exact same triple share was twice in the same output. I'll need to look more into that, I don't think this should happen.

jakmeier and others added 3 commits January 22, 2026 15:06
Ensure in tests that we are able to generate triples and
presignatures, even when one node is offline.

This also adds a fixture input for 7 nodes and threshold 4.
The triples format has changed to using triple pairs.
Of course, the offline node will never have enough T or P.
@jakmeier
Copy link
Contributor Author

I've debugged this a bit further, still not sure what is going on.

test_triple_with_offline_node is non-deterministic. Most of the time it fails but I saw it passing in a few cases. Even when taking out the offline filter for node 1, this test seems to fail more often than not. Maybe something about using 7 nodes makes this test fail. (I thought it used to work without the offline filter, it could be due to rebasing on top of develop.)

Triple generation does work in general, though. Just not consistently. Looking at the debug page, I see that T on every node (except the offline node) does increase in many runs. Although, usually it only goes to T=3, 4, or 5. It needs to be at least 6 for the test to succeed (one triple per online node).

I've tried changing the expiration timer to be 1.5 times longer for deliberators, so they haven' expired yet when the proposer starts the protocol. That resulted in a lot of warnings that I couldn't quite pin down yet why they would happen. Warnings:

  • received START on protocol we have no info for
  • received ACCEPT/REJECT on protocol we have no info for
  • triple already generating id=11042233609910426026 from=Participant(6) action=Reject

@jakmeier
Copy link
Contributor Author

It seems with lower node counts, I have a better success chance.

5 out of 5 nodes online seems to work fine

7 out of 7 nodes online produces the max concurrent generators (7 * 16 = 112), which all get stuck after ~850 pokes. No triples are generated successfully. It could be that we drop some messages, or maybe reorder them, so the crypto protocol never finishes.

image

4 out of 5 nodes online produces some triples successfully. (See T=3 in the screenshot.) But not on all nodes. Then it struggles for a while, having 0 active generators. Eventually, it manages to get through the posits round. Then it somehow gets stuck there with very few pokes per generator.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants