-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Remove total timeout from signature and add test #566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
feat: Remove total timeout from signature and add test #566
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes the total timeout mechanism from signature request processing and implements an exponential backoff retry strategy. Instead of failing signature requests after a total timeout period, each SignRequest now tracks its retry attempts and applies increasing delays between retries.
Key Changes:
- Replaced total timeout logic with per-request retry attempt tracking
- Added exponential backoff mechanism with configurable base and max delays (1s to 60s by default)
- New integration tests verify retry behavior with message dropping
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
integration-tests/tests/cases/mpc.rs |
Adds two comprehensive tests for signature retry behavior and an unused helper function for proposer selection |
integration-tests/src/mpc_fixture/mod.rs |
Exports the new SignatureDropper test utility |
integration-tests/src/mpc_fixture/message_filters.rs |
Introduces SignatureDropper helper for selectively dropping signature messages in tests |
integration-tests/src/mpc_fixture/fixture_tasks.rs |
Updates message filtering to support multiple filters and improves logging |
integration-tests/src/mpc_fixture/fixture_interface.rs |
Adds task tracking and shutdown capability to fixture nodes |
integration-tests/src/mpc_fixture/builder.rs |
Changes filter storage from single to vector and tracks spawned task handles |
chain-signatures/node/src/util.rs |
Removes the now-unused is_elapsed_longer_than_timeout function |
chain-signatures/node/src/protocol/signature.rs |
Core implementation of retry attempt tracking, exponential backoff logic, and removal of total timeout checks |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
jakmeier
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff!
I see you removed the exponential backoff. Makes sense, I don't think we need, as discussed on Monday. What I thought of later is that we could also make the timeout itself an exponential backoff. That is, wait longer and longer between retries, without actually removing it. Probably not needed either, but I thought I mention it in case we come around and add a backoff mechanism later.
Also, nice use of the component testing setup! Let me know if you have any thoughts after using it. I have some ideas what I want to improve but I'd really like your input before I put too much else into it.
| let mut should_filter = true; | ||
| for filter in filters.iter_mut() { | ||
| if !filter(&send_message) { | ||
| tracing::info!(?from, ?to, log_msg, "Dropping a message because it didn't pass the test's filter"); | ||
| should_filter = false; | ||
| break; | ||
| } | ||
| } | ||
|
|
||
| if !should_filter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code clarity:
- Isn't the naming of
should_filterthe wrong way around? (In my mind: To filter = not sending the message. The code only sends the message ifshould_filter = true) - Up to you which way you like it better but you could avoid the entire
should_filterboolean by using a loop label and a directcontinue 'outerin place of the break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, corrected it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes the code here locally now makes more sense. But you also inverted the meaning of the filter in general, which wasn't what I meant. I'd really prefer staying with the semantics of the standard library filter.
Can we keep it as filter returning true means it passes the filter? The local variable should_filter here can stay as it is, just needs inverted assignment. Or maybe should_drop or should_send would be clearer naming.
| let log = network.output.msg_log.lock().await; | ||
| log.iter() | ||
| .filter(|entry| entry.starts_with("Signature")) | ||
| .count() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
side note: Should we change how logs are represented? I just wanted some dumb way of keeping a log of what happened, without coupling the testing closely with internal implementation types. Curious to hear what you think after you have written your first tests with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think we need more robust typing or better convenience helpers for this in the long term. It's going to be hard to follow what this is doing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bullet point to possibly improve this quarter in #583.
|
|
||
| let request = self.organize_request(stable, participants, indexed, 0); | ||
| let mut request = self.organize_request(stable, participants, indexed, 0); | ||
| if request.indexed.timestamp_sign_queue.is_none() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this field optional in the first place? Can we make it non optional and ensure timestamp is added at the indexer level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removes total timeout from signature requests, where we have a exponential backoff for old requests that couldn't be fulfilled.
Each
SignRequestinside theSignQueuenow tracks attempts such that we can increase the delay on the next time they need to be processed.