Skip to content

categraf采集多节点ipmi问题 #1389

@safeAndSound3

Description

@safeAndSound3

Relevant config.toml

[[instances]]
target="node1"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
[[instances]]
target="node2"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
[[instances]]
target="node3"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
[[instances]]
target="node4"
user = "xxx"
pass = "xxxx"
driver = "LAN_2_0"
privilege = "ADMINISTRATOR"
timeout = 100000
collectors = ["bmc","ipmi","chassis", "sel", "dcmi" ]
.
.
.

Logs from categraf

● categraf.service - Opensource telemetry collector
   Loaded: loaded (/etc/systemd/system/categraf.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2025-12-25 14:00:18 CST; 1 weeks 4 days ago
 Main PID: 3598 (categraf)
    Tasks: 128
   Memory: 80.6M
   CGroup: /system.slice/categraf.service
           ├─ 3598 /opt/categraf/categraf -configs /opt/categraf/conf
           ├─20278 /usr/sbin/ipmi-sensors --output-sensor-state -Q --ignore-unrecognized-events --comma-separated-output --no-header-output --sdr-cache-recreate --output-event-bitmask --output-sensor-state --config-file /tmp/ipmi_exporter-399047aa89861b7cbc4578fd15fa926e -h 10.32.111.40 --ignore-not-available...
           └─20716 ipmi-sel --info --config-file /tmp/ipmi_exporter-80056b68390dc3e18b2cd2e4107d42d7 -h xxxxxx

Jan 05 14:37:34 node3 categraf[3598]: 2026/01/05 14:37:34 collector_chassis.go:71: E! Failed to collect chassis data target node1 error error running ipmi-chassis: exit status 1: ipmi-chassis: connection timeout
Jan 05 14:39:14 node3 categraf[3598]: 2026/01/05 14:39:14 collector_sel.go:66: E! Failed to collect SEL data target node1 error error running ipmi-sel: exit status 1: ipmi-sel: connection timeout
Jan 05 14:40:54 node3 categraf[3598]: 2026/01/05 14:40:54 collector_dcmi.go:59: E! Failed to collect DCMI data target node1 error error running ipmi-dcmi: exit status 1: ipmi-dcmi: connection timeout
Jan 05 14:42:34 node3 categraf[3598]: 2026/01/05 14:42:34 collector_bmc.go:59: E! Failed to collect BMC data target node1 error error running bmc-info: exit status 1: bmc-info: connection timeout
Jan 05 14:44:14 node3 categraf[3598]: 2026/01/05 14:44:14 collector_ipmi.go:156: E! Failed to collect sensor data target node1 error error running ipmimonitoring: exit status 1: /usr/sbin/ipmi-sensors: connection timeout
Jan 05 14:45:54 node3 categraf[3598]: 2026/01/05 14:45:54 collector_chassis.go:71: E! Failed to collect chassis data target node1 error error running ipmi-chassis: exit status 1: ipmi-chassis: connection timeout
Jan 05 14:47:34 node3 categraf[3598]: 2026/01/05 14:47:34 collector_sel.go:66: E! Failed to collect SEL data target node1 error error running ipmi-sel: exit status 1: ipmi-sel: connection timeout
Jan 05 14:49:15 node3 categraf[3598]: 2026/01/05 14:49:15 collector_dcmi.go:59: E! Failed to collect DCMI data target node1 error error running ipmi-dcmi: exit status 1: ipmi-dcmi: connection timeout
Jan 05 14:50:55 node3 categraf[3598]: 2026/01/05 14:50:55 collector_bmc.go:59: E! Failed to collect BMC data target node1 error error running bmc-info: exit status 1: bmc-info: connection timeout
Jan 05 14:52:35 node3 categraf[3598]: 2026/01/05 14:52:35 collector_ipmi.go:156: E! Failed to collect sensor data target node1 error error running ipmimonitoring: exit status 1: /usr/sbin/ipmi-sensors: connection timeout

System info

categraf: v0.4.2-f503c7de8691b762f97adc20cf1ab4b40ba07ba1

Docker

No response

Steps to reproduce

三台节点分别采集多节点(每台采集90台左右,总共数量320台)

Expected behavior

正常运行

Actual behavior

当有节点ipmi断开之后,对应采集节点的所以目标节点都会断开.

Image Image

Additional info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions