-
Notifications
You must be signed in to change notification settings - Fork 3
feat: handle 100% signature guarantee on resharing #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements 100% signature guarantee during resharing events by ensuring that signature tasks persist and complete across resharing boundaries. Previously, signature tasks were terminated when resharing started, potentially losing in-flight signatures. The new design ensures continuous signature processing by keeping the SignatureSpawner alive and pausing individual SignTasks during resharing, then resuming them when resharing completes.
Key Changes
- SignatureSpawner lifecycle: The spawner is now started at protocol initialization and runs continuously, updating its governance info dynamically rather than being recreated for each state transition
- SignTask pause/resume: Individual signature tasks now detect resharing, pause their work, wait for the contract to return to Running state, and then resume from the organizing phase with updated governance info
- GovernanceInfo abstraction: Introduced a new struct to encapsulate participant info (me, threshold, epoch, public_key, participants) that can be updated when the contract state changes
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| integration-tests/tests/cases/mpc.rs | Added two tests validating signature completion across resharing scenarios |
| integration-tests/src/mpc_fixture/fixture_interface.rs | Added helper methods to simulate resharing in tests (trigger_resharing, complete_resharing, wait_for_running) |
| integration-tests/src/mpc_fixture/builder.rs | Removed Arc wrapper from sign_rx to simplify ownership model |
| chain-signatures/node/src/rpc.rs | Added GovernanceInfo struct and methods to extract/wait for governance information from contract state |
| chain-signatures/node/src/protocol/test_setup.rs | Updated to start SignatureSpawner immediately in test setup |
| chain-signatures/node/src/protocol/state.rs | Removed sign_task field from RunningState since it now persists at protocol level |
| chain-signatures/node/src/protocol/signature.rs | Major refactoring: SignTask now monitors contract state and pauses during resharing; SignatureSpawner runs continuously; removed Drop implementation to allow persistence |
| chain-signatures/node/src/protocol/posit.rs | Added Clone derive to PositAction to support new message handling pattern |
| chain-signatures/node/src/protocol/mod.rs | SignatureSpawner moved to protocol-level with Drop handler for cleanup |
| chain-signatures/node/src/protocol/consensus.rs | Removed sign_task initialization from state transitions since it's now protocol-level |
| chain-signatures/node/src/cli.rs | SignatureSpawner started immediately at protocol initialization rather than on state transitions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jakmeier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change and great tests!
Is it really required to merge after #607? Wouldn't it be an independent change that works either way?
| // NOTE: We intentionally do NOT implement Drop for SignatureSpawnerTask. | ||
| // The spawner should persist across resharing, so dropping it should not abort the task. | ||
| // The spawner will naturally terminate when all channels close (e.g., during node shutdown). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, I don't quite follow, when would SignatureSpawnerTask drop anyway? It's stored in MpcSignProtocol, right? So the only way it would be dropped is if MpcSignProtocol drops. And that's exactly when you call abort anyway.
I don't understand the difference between what you have now and aborting the handle whenever SignatureSpawnerTask drops. What am I missing? 😕
Also, could you put this comment somewhere closer to the SignatureSpawnerTask definition? Or inside fn abort just above? Seems a but out of place here and I probably wouldn't read it when I look at the relevant other code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh that's an old comment. I implemented drop later but forgot to remove this comment. The idea is to have it be dropped when MpcSignProtocol gets dropped but so we can gracefully exit from there but it doesn't have to be there either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok so let's just remove the comment then?
| if let Some(me) = running.participants.find_participant(&self.account_id) { | ||
| return GovernanceInfo { | ||
| me: *me, | ||
| threshold: running.threshold, | ||
| epoch: running.epoch, | ||
| public_key: PublicKey::from(running.public_key), | ||
| participants: running.participants.keys().copied().collect(), | ||
| }; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think something like this would work, too
| if let Some(me) = running.participants.find_participant(&self.account_id) { | |
| return GovernanceInfo { | |
| me: *me, | |
| threshold: running.threshold, | |
| epoch: running.epoch, | |
| public_key: PublicKey::from(running.public_key), | |
| participants: running.participants.keys().copied().collect(), | |
| }; | |
| } | |
| if let Some(gov) = self.governance() { | |
| return gov; | |
| } |
| // Trigger resharing while sign task is in progress | ||
| tracing::info!("triggering resharing"); | ||
| network.trigger_resharing(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know it hasn't finished the signature, yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, I'll add an assert before that the actions hasn't replied yet with the signature
|
@jakmeier I just don't want to resolve more merge conflicts so anything that changes the same set of files, I just work off of whatever was changed before. Just makes things simpler in terms of reviews. It'll be rebased no matter what, but at least there's one less friction point |
This was discussed in our 1-1, but I want to make sure we don’t forget about it. We should never discard the old root key. All requests must specify a key version, which, in the case of an upgrade, will point to the older key version. As a result, all indexed requests must eventually receive a response. |
This makes signature be handle on the cases of resharing. Before, we would just terminate the sign tasks on reaching resharing, which is not ideal when we want to guarantee 100% signatures.
So with this PR, there are a couple changes to specifically handle resharing cases when doing signature requests:
SignatureSpawnernever be aborted since it will always be receiving sign requests and self update it's own relevant information.SignTaskwill check if we're in resharing, reset itself to organization and then pause if so. It will then update with new governance info when resharing completes and go back into running.NOTE: The only time where we can throw away sign requests should be on key refreshing (changing out the whole master key) or when we no longer support a specific key version. These cases still need to be handled eventually.
I built this on top of the fix for sign requests on devnet, so #607 needs to go in first