-
Notifications
You must be signed in to change notification settings - Fork 329
Open
Description
Relevant config.toml
[[instances]]
target="node1"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
[[instances]]
target="node2"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
[[instances]]
target="node3"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
[[instances]]
target="node4"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
.
.
.Logs from categraf
● categraf.service - Opensource telemetry collector
Loaded: loaded (/etc/systemd/system/categraf.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2025-12-25 14:00:18 CST; 1 weeks 4 days ago
Main PID: 3598 (categraf)
Tasks: 128
Memory: 80.6M
CGroup: /system.slice/categraf.service
├─ 3598 /opt/categraf/categraf -configs /opt/categraf/conf
├─20278 /usr/sbin/ipmi-sensors --output-sensor-state -Q --ignore-unrecognized-events --comma-separated-output --no-header-output --sdr-cache-recreate --output-event-bitmask --output-sensor-state --config-file /tmp/ipmi_exporter-399047aa89861b7cbc4578fd15fa926e -h 10.32.111.40 --ignore-not-available...
└─20716 ipmi-sel --info --config-file /tmp/ipmi_exporter-80056b68390dc3e18b2cd2e4107d42d7 -h xxxxxx
Jan 05 14:37:34 node3 categraf[3598]: 2026/01/05 14:37:34 collector_chassis.go:71: E! Failed to collect chassis data target node1 error error running ipmi-chassis: exit status 1: ipmi-chassis: connection timeout
Jan 05 14:39:14 node3 categraf[3598]: 2026/01/05 14:39:14 collector_sel.go:66: E! Failed to collect SEL data target node1 error error running ipmi-sel: exit status 1: ipmi-sel: connection timeout
Jan 05 14:40:54 node3 categraf[3598]: 2026/01/05 14:40:54 collector_dcmi.go:59: E! Failed to collect DCMI data target node1 error error running ipmi-dcmi: exit status 1: ipmi-dcmi: connection timeout
Jan 05 14:42:34 node3 categraf[3598]: 2026/01/05 14:42:34 collector_bmc.go:59: E! Failed to collect BMC data target node1 error error running bmc-info: exit status 1: bmc-info: connection timeout
Jan 05 14:44:14 node3 categraf[3598]: 2026/01/05 14:44:14 collector_ipmi.go:156: E! Failed to collect sensor data target node1 error error running ipmimonitoring: exit status 1: /usr/sbin/ipmi-sensors: connection timeout
Jan 05 14:45:54 node3 categraf[3598]: 2026/01/05 14:45:54 collector_chassis.go:71: E! Failed to collect chassis data target node1 error error running ipmi-chassis: exit status 1: ipmi-chassis: connection timeout
Jan 05 14:47:34 node3 categraf[3598]: 2026/01/05 14:47:34 collector_sel.go:66: E! Failed to collect SEL data target node1 error error running ipmi-sel: exit status 1: ipmi-sel: connection timeout
Jan 05 14:49:15 node3 categraf[3598]: 2026/01/05 14:49:15 collector_dcmi.go:59: E! Failed to collect DCMI data target node1 error error running ipmi-dcmi: exit status 1: ipmi-dcmi: connection timeout
Jan 05 14:50:55 node3 categraf[3598]: 2026/01/05 14:50:55 collector_bmc.go:59: E! Failed to collect BMC data target node1 error error running bmc-info: exit status 1: bmc-info: connection timeout
Jan 05 14:52:35 node3 categraf[3598]: 2026/01/05 14:52:35 collector_ipmi.go:156: E! Failed to collect sensor data target node1 error error running ipmimonitoring: exit status 1: /usr/sbin/ipmi-sensors: connection timeout
System info
categraf: v0.4.2-f503c7de8691b762f97adc20cf1ab4b40ba07ba1
Docker
No response
Steps to reproduce
三台节点分别采集多节点(每台采集90台左右,总共数量320台)
Expected behavior
正常运行
Actual behavior
当有节点ipmi断开之后,对应采集节点的所以目标节点都会断开.
Additional info
No response
Metadata
Metadata
Assignees
Labels
No labels