-
Notifications
You must be signed in to change notification settings - Fork 18
Description
We've been having some issues running amonagent on a Debian Jessie machine
It keeps getting killed by SIGPIPE
# systemctl status amonagent -l
● amonagent.service - Starts and stops amonagent
Loaded: loaded (/lib/systemd/system/amonagent.service; enabled)
Active: inactive (dead) since Sun 2017-04-02 04:47:45 WEST; 1 day 3h ago
Docs: https://www.amon.cx/docs
Main PID: 14402 (code=killed, signal=PIPE)
Apr 03 08:39:58 amonagent[25552]: time="2017-04-03T08:39:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:39:59 amonagent[25552]: time="2017-04-03T08:39:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:40:58 amonagent[25552]: time="2017-04-03T08:40:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:40:59 amonagent[25552]: time="2017-04-03T08:40:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:41:58 amonagent[25552]: time="2017-04-03T08:41:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:41:59 amonagent[25552]: time="2017-04-03T08:41:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:42:58 amonagent[25552]: time="2017-04-03T08:42:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:42:59 amonagent[25552]: time="2017-04-03T08:42:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
Apr 03 08:43:58 amonagent[25552]: time="2017-04-03T08:43:58+01:00" level=info msg="Metrics collected (Interval:1m0s)\n"
Apr 03 08:43:59 amonagent[25552]: time="2017-04-03T08:43:59+01:00" level=info msg="Sending data to https://xx.amon.cx/api/system/v2/?api_key=xx\n"
It seems to be related to journald restarting (which drops the stdout pipe i guess), and since amon doesnt handle SIGPIPE is gets killed.
The default systemd config specifies Restart=on-failure - systemd apparently doesnt consider this a failure.
The best way to handle this would probably be to handle SIGPIPE and exit gracefully with a non-zero exit code
Another option would be specifying Restart=always in the systemd file (realisticly - you want the amon agent running always, right?)
(The systemd restarts seem to be caused by some sort of "hardware" (qemu, so virtual) issue - so it won't be super common, but it's probably good to handle this better)