Skip to content

Systemd - Killed by SIGPIPE #19

@imerr

Description

@imerr

We've been having some issues running amonagent on a Debian Jessie machine
It keeps getting killed by SIGPIPE

# systemctl status amonagent -l
● amonagent.service - Starts and stops amonagent
   Loaded: loaded (/lib/systemd/system/amonagent.service; enabled)
   Active: inactive (dead) since Sun 2017-04-02 04:47:45 WEST; 1 day 3h ago
     Docs: https://www.amon.cx/docs
 Main PID: 14402 (code=killed, signal=PIPE)

Apr 03 08:39:58 amonagent[25552]: time="2017-04-03T08:39:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:39:59 amonagent[25552]: time="2017-04-03T08:39:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:40:58 amonagent[25552]: time="2017-04-03T08:40:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:40:59 amonagent[25552]: time="2017-04-03T08:40:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:41:58 amonagent[25552]: time="2017-04-03T08:41:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:41:59 amonagent[25552]: time="2017-04-03T08:41:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:42:58 amonagent[25552]: time="2017-04-03T08:42:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:42:59 amonagent[25552]: time="2017-04-03T08:42:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:43:58 amonagent[25552]: time="2017-04-03T08:43:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:43:59 amonagent[25552]: time="2017-04-03T08:43:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"

It seems to be related to journald restarting (which drops the stdout pipe i guess), and since amon doesnt handle SIGPIPE is gets killed.
The default systemd config specifies Restart=on-failure - systemd apparently doesnt consider this a failure.

The best way to handle this would probably be to handle SIGPIPE and exit gracefully with a non-zero exit code
Another option would be specifying Restart=always in the systemd file (realisticly - you want the amon agent running always, right?)

(The systemd restarts seem to be caused by some sort of "hardware" (qemu, so virtual) issue - so it won't be super common, but it's probably good to handle this better)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions