Skip to content

Conversation

@ChaoticTempest
Copy link
Contributor

Removes total timeout from signature requests, where we have a exponential backoff for old requests that couldn't be fulfilled.

Each SignRequest inside the SignQueue now tracks attempts such that we can increase the delay on the next time they need to be processed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the total timeout mechanism from signature request processing and implements an exponential backoff retry strategy. Instead of failing signature requests after a total timeout period, each SignRequest now tracks its retry attempts and applies increasing delays between retries.

Key Changes:

  • Replaced total timeout logic with per-request retry attempt tracking
  • Added exponential backoff mechanism with configurable base and max delays (1s to 60s by default)
  • New integration tests verify retry behavior with message dropping

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
integration-tests/tests/cases/mpc.rs Adds two comprehensive tests for signature retry behavior and an unused helper function for proposer selection
integration-tests/src/mpc_fixture/mod.rs Exports the new SignatureDropper test utility
integration-tests/src/mpc_fixture/message_filters.rs Introduces SignatureDropper helper for selectively dropping signature messages in tests
integration-tests/src/mpc_fixture/fixture_tasks.rs Updates message filtering to support multiple filters and improves logging
integration-tests/src/mpc_fixture/fixture_interface.rs Adds task tracking and shutdown capability to fixture nodes
integration-tests/src/mpc_fixture/builder.rs Changes filter storage from single to vector and tracks spawned task handles
chain-signatures/node/src/util.rs Removes the now-unused is_elapsed_longer_than_timeout function
chain-signatures/node/src/protocol/signature.rs Core implementation of retry attempt tracking, exponential backoff logic, and removal of total timeout checks

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

jakmeier
jakmeier previously approved these changes Oct 23, 2025
Copy link
Contributor

@jakmeier jakmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff!

I see you removed the exponential backoff. Makes sense, I don't think we need, as discussed on Monday. What I thought of later is that we could also make the timeout itself an exponential backoff. That is, wait longer and longer between retries, without actually removing it. Probably not needed either, but I thought I mention it in case we come around and add a backoff mechanism later.

Also, nice use of the component testing setup! Let me know if you have any thoughts after using it. I have some ideas what I want to improve but I'd really like your input before I put too much else into it.

Comment on lines 52 to 61
let mut should_filter = true;
for filter in filters.iter_mut() {
if !filter(&send_message) {
tracing::info!(?from, ?to, log_msg, "Dropping a message because it didn't pass the test's filter");
should_filter = false;
break;
}
}

if !should_filter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code clarity:

  1. Isn't the naming of should_filter the wrong way around? (In my mind: To filter = not sending the message. The code only sends the message if should_filter = true)
  2. Up to you which way you like it better but you could avoid the entire should_filter boolean by using a loop label and a direct continue 'outer in place of the break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, corrected it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes the code here locally now makes more sense. But you also inverted the meaning of the filter in general, which wasn't what I meant. I'd really prefer staying with the semantics of the standard library filter.

Can we keep it as filter returning true means it passes the filter? The local variable should_filter here can stay as it is, just needs inverted assignment. Or maybe should_drop or should_send would be clearer naming.

Comment on lines +348 to +351
let log = network.output.msg_log.lock().await;
log.iter()
.filter(|entry| entry.starts_with("Signature"))
.count()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side note: Should we change how logs are represented? I just wanted some dumb way of keeping a log of what happened, without coupling the testing closely with internal implementation types. Curious to hear what you think after you have written your first tests with this approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think we need more robust typing or better convenience helpers for this in the long term. It's going to be hard to follow what this is doing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a bullet point to possibly improve this quarter in #583.


let request = self.organize_request(stable, participants, indexed, 0);
let mut request = self.organize_request(stable, participants, indexed, 0);
if request.indexed.timestamp_sign_queue.is_none() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this field optional in the first place? Can we make it non optional and ensure timestamp is added at the indexer level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it had a purpose once upon a time. Maybe @ppca can probably explain why, but I've since removed it in #587

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants