A professional Prometheus exporter for Proxmox VE, written in Go. It collects comprehensive metrics from your Proxmox nodes, virtual machines (QEMU), LXC containers, and storage, exposing them for monitoring and alerting.
- Comprehensive Metrics:
- Node: CPU, Memory, Uptime, Status, VM/LXC counts.
- VM (QEMU): CPU, Memory, Disk, Network I/O, Uptime, Status.
- LXC Containers: CPU, Memory, Disk, Network I/O, Uptime, Status.
- Storage: Usage, Availability, Total size.
- ZFS: Pool health, fragmentation, ARC statistics.
- Cluster/HA: Quorum status, node counts, HA resource management.
- Replication: Sync timestamps, duration, status monitoring.
- Certificates: SSL certificate expiry tracking.
- Hardware Sensors: Temperatures, fan speeds, voltages, power (via lm-sensors).
- Disk Metrics: I/O throughput (automatic), SMART health, temperature, TBW (optional setup).
- Secure: Supports API Token authentication (recommended) and standard password auth.
- Lightweight: Single static binary, runs as systemd service.
- Easy Configuration: Configure via environment variables or YAML file.
# Download latest release
wget https://github.com/bigtcze/pve-exporter/releases/latest/download/pve-exporter-linux-amd64
chmod +x pve-exporter-linux-amd64
# Run manually
./pve-exporter-linux-amd64 -config config.yml| Command | Description |
|---|---|
-version |
Print version and exit |
-selfupdate |
Update to latest version from GitHub and restart service |
Self-update:
# Update to latest version (run as root)
sudo pve-exporter -selfupdateNote:
-selfupdaterequires root privileges because it:
- Replaces the binary in
/usr/local/bin/- Runs
systemctl restart pve-exporterto apply the update
For production use, install the exporter as a systemd service running under a dedicated user.
sudo useradd --system --no-create-home --shell /usr/sbin/nologin pve-exportersudo wget -O /usr/local/bin/pve-exporter \
https://github.com/bigtcze/pve-exporter/releases/latest/download/pve-exporter-linux-amd64
sudo chmod +x /usr/local/bin/pve-exportersudo mkdir -p /etc/pve-exporter
sudo cat > /etc/pve-exporter/config.yml << 'EOF'
proxmox:
host: "proxmox.example.com"
port: 8006
# Option A: Password authentication
user: "monitoring@pve"
password: "your-password"
# Option B: API Token authentication (recommended, comment out user/password above)
# token_id: "monitoring@pve!exporter"
# token_secret: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
insecure_skip_verify: true
server:
listen_address: ":9221"
metrics_path: "/metrics"
EOF
# Secure the config file (contains credentials)
sudo chown root:pve-exporter /etc/pve-exporter/config.yml
sudo chmod 640 /etc/pve-exporter/config.ymlNote: The
token_idformat isuser@realm!tokenname- the exclamation mark is Proxmox syntax, not an error.
sudo cat > /etc/systemd/system/pve-exporter.service << 'EOF'
[Unit]
Description=Proxmox VE Exporter for Prometheus
Documentation=https://github.com/bigtcze/pve-exporter
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=pve-exporter
Group=pve-exporter
ExecStart=/usr/local/bin/pve-exporter -config /etc/pve-exporter/config.yml
Restart=on-failure
RestartSec=5
# Security hardening
ProtectSystem=strict
ProtectHome=yes
PrivateTmp=yes
ProtectKernelTunables=yes
ProtectKernelModules=yes
ProtectControlGroups=yes
ReadOnlyPaths=/
ReadWritePaths=
[Install]
WantedBy=multi-user.target
EOFsudo systemctl daemon-reload
sudo systemctl enable pve-exporter
sudo systemctl start pve-exporter
# Check status
sudo systemctl status pve-exporter
# View logs
sudo journalctl -u pve-exporter -f| Option | Description | Default |
|---|---|---|
proxmox.host |
Proxmox host address | localhost |
proxmox.port |
Proxmox API port | 8006 |
proxmox.user |
Proxmox user (for password auth) | root@pam |
proxmox.password |
Proxmox password | - |
proxmox.token_id |
API token ID (alternative to password) | - |
proxmox.token_secret |
API token secret | - |
proxmox.insecure_skip_verify |
Skip TLS verification | true |
server.listen_address |
HTTP server listen address | :9221 |
server.metrics_path |
Metrics endpoint path | /metrics |
As an alternative to a config file, you can use environment variables:
| Variable | Config equivalent |
|---|---|
PVE_HOST |
proxmox.host |
PVE_USER |
proxmox.user |
PVE_PASSWORD |
proxmox.password |
PVE_TOKEN_ID |
proxmox.token_id |
PVE_TOKEN_SECRET |
proxmox.token_secret |
PVE_INSECURE_SKIP_VERIFY |
proxmox.insecure_skip_verify |
LISTEN_ADDRESS |
server.listen_address |
METRICS_PATH |
server.metrics_path |
Import the official dashboard from Grafana.com: 24550
Or import manually from grafana/pve-exporter.json
Features:
- Cluster overview with node/VM/LXC counts
- Per-node CPU, Memory, Load, and Filesystem metrics
- VM and LXC tables with status, CPU, memory, uptime
- Network and Disk I/O graphs
- Storage usage visualization
- ZFS pool health, fragmentation, and ARC statistics
The exporter exposes the following metrics at /metrics.
| Metric | Description |
|---|---|
pve_node_up |
Node status (1=online) |
pve_node_uptime_seconds |
Node uptime in seconds |
pve_node_cpu_load |
Node CPU load |
pve_node_cpus_total |
Total number of CPUs |
pve_node_memory_total_bytes |
Total memory in bytes |
pve_node_memory_used_bytes |
Used memory in bytes |
pve_node_memory_free_bytes |
Free memory in bytes |
pve_node_swap_total_bytes |
Total swap in bytes |
pve_node_swap_used_bytes |
Used swap in bytes |
pve_node_swap_free_bytes |
Free swap in bytes |
pve_node_vm_count |
Number of QEMU VMs |
pve_node_lxc_count |
Number of LXC containers |
pve_node_load1 |
Load average 1 minute |
pve_node_load5 |
Load average 5 minutes |
pve_node_load15 |
Load average 15 minutes |
pve_node_iowait |
I/O wait ratio |
pve_node_idle |
Idle CPU ratio |
pve_node_cpu_mhz |
CPU frequency in MHz |
pve_node_rootfs_total_bytes |
Root filesystem total size |
pve_node_rootfs_used_bytes |
Root filesystem used |
pve_node_rootfs_free_bytes |
Root filesystem free |
pve_node_cpu_cores |
CPU cores per socket |
pve_node_cpu_sockets |
Number of CPU sockets |
pve_node_ksm_shared_bytes |
KSM shared memory |
| Metric | Description |
|---|---|
pve_vm_status |
VM status (1=running, 0=stopped) |
pve_vm_uptime_seconds |
VM uptime in seconds |
pve_vm_cpu_usage |
VM CPU usage (0.0-1.0) |
pve_vm_cpus |
Number of CPUs allocated |
pve_vm_memory_used_bytes |
Used memory in bytes |
pve_vm_memory_max_bytes |
Total memory in bytes |
pve_vm_memory_free_bytes |
Free memory (guest agent) |
pve_vm_memory_host_bytes |
Host memory allocation |
pve_vm_balloon_bytes |
Balloon target in bytes |
pve_vm_balloon_actual_bytes |
Balloon actual memory |
pve_vm_balloon_max_bytes |
Balloon max memory |
pve_vm_balloon_total_bytes |
Balloon total guest memory |
pve_vm_balloon_major_page_faults_total |
Major page faults |
pve_vm_balloon_minor_page_faults_total |
Minor page faults |
pve_vm_balloon_mem_swapped_in_bytes |
Memory swapped in |
pve_vm_balloon_mem_swapped_out_bytes |
Memory swapped out |
pve_vm_disk_max_bytes |
Total disk space in bytes |
pve_vm_network_in_bytes_total |
Network input bytes |
pve_vm_network_out_bytes_total |
Network output bytes |
pve_vm_disk_read_bytes_total |
Disk read bytes |
pve_vm_disk_write_bytes_total |
Disk write bytes |
pve_vm_ha_managed |
Managed by HA (1=yes) |
pve_vm_pid |
Process ID |
pve_vm_pressure_cpu_full |
CPU pressure full |
pve_vm_pressure_cpu_some |
CPU pressure some |
pve_vm_pressure_io_full |
I/O pressure full |
pve_vm_pressure_io_some |
I/O pressure some |
pve_vm_pressure_memory_full |
Memory pressure full |
pve_vm_pressure_memory_some |
Memory pressure some |
pve_vm_block_read_bytes_total |
Block device read bytes (label: device) |
pve_vm_block_write_bytes_total |
Block device write bytes (label: device) |
pve_vm_block_read_ops_total |
Block device read ops (label: device) |
pve_vm_block_write_ops_total |
Block device write ops (label: device) |
pve_vm_block_failed_read_ops_total |
Block device failed read ops (label: device) |
pve_vm_block_failed_write_ops_total |
Block device failed write ops (label: device) |
pve_vm_block_flush_ops_total |
Block device flush ops (label: device) |
pve_vm_nic_in_bytes_total |
NIC input bytes (label: interface) |
pve_vm_nic_out_bytes_total |
NIC output bytes (label: interface) |
pve_vm_last_backup_timestamp |
Unix timestamp of last successful backup |
| Metric | Description |
|---|---|
pve_lxc_status |
LXC status (1=running, 0=stopped) |
pve_lxc_uptime_seconds |
LXC uptime in seconds |
pve_lxc_cpu_usage |
LXC CPU usage (0.0-1.0) |
pve_lxc_cpus |
Number of CPUs allocated |
pve_lxc_memory_used_bytes |
Used memory in bytes |
pve_lxc_memory_max_bytes |
Total memory in bytes |
pve_lxc_disk_used_bytes |
Used disk space in bytes |
pve_lxc_disk_max_bytes |
Total disk space in bytes |
pve_lxc_swap_used_bytes |
Used swap in bytes |
pve_lxc_swap_max_bytes |
Maximum swap in bytes |
pve_lxc_network_in_bytes_total |
Network input bytes |
pve_lxc_network_out_bytes_total |
Network output bytes |
pve_lxc_disk_read_bytes_total |
Disk read bytes |
pve_lxc_disk_write_bytes_total |
Disk write bytes |
pve_lxc_ha_managed |
Managed by HA (1=yes) |
pve_lxc_pid |
Process ID |
pve_lxc_pressure_cpu_full |
CPU pressure full |
pve_lxc_pressure_cpu_some |
CPU pressure some |
pve_lxc_pressure_io_full |
I/O pressure full |
pve_lxc_pressure_io_some |
I/O pressure some |
pve_lxc_pressure_memory_full |
Memory pressure full |
pve_lxc_pressure_memory_some |
Memory pressure some |
pve_lxc_last_backup_timestamp |
Unix timestamp of last successful backup |
| Metric | Description |
|---|---|
pve_storage_total_bytes |
Total storage size in bytes |
pve_storage_used_bytes |
Used storage in bytes |
pve_storage_available_bytes |
Available storage in bytes |
pve_storage_active |
Storage is active (1=yes) |
pve_storage_enabled |
Storage is enabled (1=yes) |
pve_storage_shared |
Storage is shared (1=yes) |
pve_storage_used_fraction |
Used fraction (0.0-1.0) |
| Metric | Description |
|---|---|
pve_zfs_pool_health_status |
Pool health (1=ONLINE) |
pve_zfs_pool_size_bytes |
Pool total size |
pve_zfs_pool_alloc_bytes |
Pool allocated size |
pve_zfs_pool_free_bytes |
Pool free size |
pve_zfs_pool_frag_percent |
Pool fragmentation % |
pve_zfs_arc_size_bytes |
ARC size in bytes |
pve_zfs_arc_min_size_bytes |
ARC min size |
pve_zfs_arc_max_size_bytes |
ARC max size |
pve_zfs_arc_hits_total |
ARC hits |
pve_zfs_arc_misses_total |
ARC misses |
pve_zfs_arc_hit_ratio_percent |
ARC hit ratio in percent (0-100) |
pve_zfs_arc_target_size_bytes |
ARC target size (c) |
pve_zfs_arc_l2_hits_total |
L2ARC hits |
pve_zfs_arc_l2_misses_total |
L2ARC misses |
pve_zfs_arc_l2_size_bytes |
L2ARC size |
pve_zfs_arc_l2_header_size_bytes |
L2ARC header size |
| Metric | Description |
|---|---|
pve_cluster_quorate |
Cluster has quorum (1=yes, 0=no) |
pve_cluster_nodes_total |
Total number of nodes in cluster |
pve_cluster_nodes_online |
Number of online nodes |
pve_ha_resources_total |
Total HA managed resources |
pve_ha_resources_active |
Number of active HA resources |
| Metric | Description |
|---|---|
pve_replication_last_sync_timestamp |
Unix timestamp of last replication (labels: guest, job) |
pve_replication_duration_seconds |
Duration of last replication (labels: guest, job) |
pve_replication_status |
Replication status (1=OK, 0=error, labels: guest, job) |
| Metric | Description |
|---|---|
pve_certificate_expiry_seconds |
Seconds until SSL certificate expires |
Note: These metrics are collected from the local host where pve-exporter runs using lm-sensors. Labels: node, chip, adapter, sensor.
| Metric | Description |
|---|---|
pve_sensor_temperature_celsius |
Temperature reading in Celsius |
pve_sensor_fan_rpm |
Fan speed in RPM |
pve_sensor_voltage_volts |
Voltage reading in Volts |
pve_sensor_power_watts |
Power consumption in Watts |
These metrics are collected automatically from /proc/diskstats without requiring root. Labels: node, device.
| Metric | Description |
|---|---|
pve_disk_read_bytes_total |
Total bytes read from disk |
pve_disk_write_bytes_total |
Total bytes written to disk |
pve_disk_reads_completed_total |
Total read operations |
pve_disk_writes_completed_total |
Total write operations |
pve_disk_io_time_seconds_total |
Time spent doing I/O |
SMART metrics require additional setup - a separate script runs via cron as root. Labels: node, device, model, serial, type.
| Metric | Description |
|---|---|
pve_disk_temperature_celsius |
Disk temperature |
pve_disk_power_on_hours |
Power on hours |
pve_disk_health_status |
Health (1=healthy, 0=failing) |
pve_disk_data_written_bytes |
NVMe TBW |
pve_disk_available_spare_percent |
NVMe available spare % |
pve_disk_percentage_used |
NVMe life used % |
SMART Setup:
# 1. Install collector script
sudo wget -O /usr/local/bin/pve-smart-collector.sh \
https://raw.githubusercontent.com/bigtcze/pve-exporter/main/scripts/pve-smart-collector.sh
sudo chmod +x /usr/local/bin/pve-smart-collector.sh
sudo mkdir -p /var/lib/pve-exporter
# 2. Add cron job (every 5 minutes - SMART data doesn't change frequently)
echo '*/5 * * * * root /usr/local/bin/pve-smart-collector.sh' | sudo tee /etc/cron.d/pve-smart-collector
# 3. Verify
sudo /usr/local/bin/pve-smart-collector.sh
curl -s http://localhost:9221/metrics | grep pve_diskNote: If SMART data file is missing or stale, those metrics are silently skipped.
For security best practices, create a dedicated monitoring user with read-only permissions.
- Create User:
monitoring@pve - Assign Role:
PVEAuditor(provides read-only access to Nodes, VMs, Storage) - Create API Token:
monitoring@pve!exporter(uncheck "Privilege Separation")
# Clone
git clone https://github.com/bigtcze/pve-exporter.git
cd pve-exporter
# Build
make build
# Test
make testContributions are welcome! Please submit a Pull Request.
MIT License - see LICENSE for details.