Skip to content

device rpc hangs in notification when devices return large amount of data #237

@olofhagsand

Description

@olofhagsand

Sending several RPC:s that collect large amount of data may hang the backend.

Debugging:
The specific case where this is observed is when pyapi initiates several RPC:s with large amount of data, then eventually the backend hangs in a "write" call on a unix socket waiting for the receiver (pyapi) to consume the data. But pyapi does not seem to get the notification.
The backend hangs "hard" in the sense that it does not recover from the "write" system call and therefore does not enter the event handling to receive new messages (or timeouts).
After debugging it is unclear why it hangs in output, when listening with socat/epbf it seems to work.

Solution discussion:

  1. One way to avoid the problem is to avoid sending large amount of data in the notification, which should probably not be done in the first place. One can read the result from the transaction instead of piggy-backing the result in the notification.
  2. Analysis of why pyapi does not receive the data over the unix socket?
  3. Backend: check output buffer before write or use no-blocking I/O to at least return an error rather than hard hanging.

Metadata

Metadata

Assignees

No one assigned

    Labels

    fixed: plz verifyBug/feature is fixed by developer, need verification

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions