Skip to content

Conversation

@AndrewFerr
Copy link
Member

Use a new, dedicated endpoint for scheduling delayed events. This should deprecate the delay query parameter on the /send and /state endpoints.

See:

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct (run the linters)

Use a new, dedicated endpoint for scheduling delayed events.
This should deprecate the `delay` query parameter on the `/send` and
`/state` endpoints.
@AndrewFerr AndrewFerr requested a review from a team as a code owner January 7, 2026 03:31
Comment on lines +565 to +567
request_body = parse_and_validate_json_object_from_request(
request, self.DelayedEventBodyModel
)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a Pydantic model to validate the request body is convenient & seems safer than manual parsing, but has the downside of giving verbose error messages when the body doesn't match the expected shape.

For example, this is the error given for requests to this servlet with an empty body:

{
  "errcode": "M_BAD_JSON",
  "error": "2 validation errors for DelayedEventBodyModel\ndelay\n  Field required [type=missing, input_value={}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.12/v/missing\ncontent\n  Field required [type=missing, input_value={}, input_type=dict]\n    For further information visit https://errors.pydantic.dev/2.12/v/missing"
}

There is also the quirk of the errcode being M_INVALID_PARAM for one missing key in the body, but M_BAD_JSON for multiple missing keys.

Is there a good way to trim down errors like this? Or is manual parsing of the request body preferred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pydantic parsing is preferred over shotgun parsing. As for the bad errors, it might be good to bring up in the backend chapter meeting tomorrow 🤷

Copy link
Member Author

@AndrewFerr AndrewFerr Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conclusion from the meeting is that there are indeed other endpoints that use Pydantic validation, and therefore may return the same kind of verbose error messages. So adding more such errors is no worse than the current situation.

But at some point soon, I'll file an issue/PR for something to trim down the long Pydantic error messages, at least to remove the “visit errors.pydantic.dev” part.

event_type: str,
state_key: str | None,
content: JsonDict,
txn_id: str | None = "mid1",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sketchy. We could have a random or auto-incremented value created in the function.

Example:

if txn_id is None:
txn_id = "m%s" % (str(time.time()))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I let the txn_id be absent when this path is used for a POST request. With that said, perhaps the desired method could be passed into this function to make that more clear, and to use your suggested auto-generation for a PUT with no given txn_id.

Comment on lines +2408 to +2426
@classmethod
def get_delayed_event_path_and_body(
cls,
room_id: str,
delay: int,
event_type: str,
state_key: str | None,
content: JsonDict,
txn_id: str | None = "mid1",
) -> tuple[str, JsonDict]:
path = (
f"rooms/{room_id}/{'send' if state_key is None else 'state'}/{event_type}"
)
if state_key is not None:
path += f"/{state_key}"
elif txn_id is not None:
path += f"/{txn_id}"
path += f"?org.matrix.msc4140.delay={delay}"
return path, content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this function. I thought we were moving away from re-using the existing /_matrix/client/v3/rooms/{roomId}/state/{eventType}/{stateKey} endpoint in favor of PUT /_matrix/client/v3/rooms/{roomId}/delayed_event/{eventType}/{txnId} (unstable: PUT /_matrix/client/unstable/org.matrix.msc4140/rooms/{roomId}/delayed_event/{eventType}/{txnId})

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intention is indeed to move away from the /state/-based endpoint, but I figured that it is useful to keep the test coverage on that endpoint while the MSC gets ironed out.

There is another test class which tests the new endpoint, which works by subclassing this test class & overriding this classmethod to swap out the /state/ endpoint with the new /delayed_event/ one.


def _get_path_for_delayed_send(room_id: str, event_type: str, delay_ms: int) -> str:
return f"rooms/{room_id}/send/{event_type}?org.matrix.msc4140.delay={delay_ms}"
def _get_delayed_event_request_args(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have a better name. Looks like this does more than grab request args

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "args" in question are the args to make_request, which admittedly isn't very clear. How about refactoring this to include the make_request call & renaming it appropriately?

channel.json_body,
)

@parameterized.expand((-2000, 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should explain these invalid values. // Invalid because negative, etc

For better test names, I also tend to do this:

@parameterized.expand(
[
# Test with a normal arbitrary type (no special meaning)
("arbitrary_type", "type", set()),
# Test with membership
("membership", EventTypes.Member, set()),
# Test with lazy-loading room members
("lazy_loading_membership", EventTypes.Member, {StateValues.LAZY}),
]
)
def test_limit_retained_previous_state_keys(
self,
_test_label: str,
event_type: str,
extra_state_keys: set[str],
) -> None:

Comment on lines +586 to +593
def _get_delayed_event_request_args(
room_id: str,
delay: int,
event_type: str,
state_key: str | None,
content: JsonDict,
txn_id: str | None = "mid1",
) -> tuple[str, bytes, JsonDict]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should mandate keyword args to be used as there are too many args to keep straight and because there multiple args with the same type in a row which is really easy to mix up.

Suggested change
def _get_delayed_event_request_args(
room_id: str,
delay: int,
event_type: str,
state_key: str | None,
content: JsonDict,
txn_id: str | None = "mid1",
) -> tuple[str, bytes, JsonDict]:
def _get_delayed_event_request_args(
*,
room_id: str,
delay: int,
event_type: str,
state_key: str | None,
content: JsonDict,
txn_id: str | None = "mid1",
) -> tuple[str, bytes, JsonDict]:

max_event_delay_ms = hs.config.server.max_event_delay_ms
(
RoomDelayedEventRestServlet(hs, max_event_delay_ms)
if max_event_delay_ms
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since max_event_delay_ms is an int and can be 0 if we fail to shotgun validate it in the future.

Suggested change
if max_event_delay_ms
if max_event_delay_ms is not None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values below 0 are already guarded against in synapse/config/server.py, but I agree that it's not obvious here. Perhaps it can be typed as a Pydantic PositiveInt.


def _raise_delayed_events_unsupported() -> NoReturn:
raise SynapseError(
HTTPStatus.BAD_REQUEST,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

400 Bad Request indicates a client error. But in this case, it's a server problem so we should probably opt for something else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How about M_FORBIDDEN or M_RESOURCE_LIMIT_EXCEEDED (and a 403 HTTP response code in both cases)?

"Delayed events are not supported on this server",
Codes.UNKNOWN,
{
"org.matrix.msc4140.errcode": "M_MAX_DELAY_UNSUPPORTED",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see M_MAX_DELAY_UNSUPPORTED in the MSC. I see M_MAX_DELAY_EXCEEDED 🤔

And this should live under synapse/api/errors.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is a different error that's not in the spec. It's also poorly named, for that matter...

What might make the most sense is to not use a unique error for this, but to treat it as the same as the requesting user having too many outstanding delayed events. The MSC defines an error for that, and is technically what's happening here if you consider disabling delayed events as the same as setting the per-user delayed event limit to 0.


class RoomDelayedEventRestServletUnsupported(RoomDelayedEventRestServletBase):
async def _do(self, *_: Any) -> NoReturn:
_raise_delayed_events_unsupported()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use this pattern elsewhere?

I feel like usually, we just don't register the servlets at all if the feature is not configured. Although the feedback isn't necessarily bad.

Feels like something we can just handle in the normal servlet though. And then we don't need the RoomDelayedEventRestServletBase separation either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this pattern is elsewhere, but I liked the idea of not having to check the config value on every call to the servlet. Not sure how useful that is, though, especially if there's interest in allowing some server config values to be mutable (which would necessitate rereading them).

request_body = parse_and_validate_json_object_from_request(
request, self.DelayedEventBodyModel
)
_check_delay_within_max(request_body.delay, self._max_event_delay_ms)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to check that this is > 0?

Feels like this validation logic should be more shared. Could even be part of the Pydantic model

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A > 0 check isn't needed here, because it's already effectively done by the model (since delay is annotated as a PositiveInt).

It's also not needed in the servlets for the /send/ & /state/ endpoints with the delay query parameter, as they call _parse_request_delay which does a > 0 check directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants