--- name: monitor-proxmox-resources version: 1.0.0 description: Unified instant and 3-sample-average CPU/RAM/storage across all VMs and CTs. Handles the VM guest-agent/ballooning distinction. tags: [proxmox, monitoring, resources, api, shell] --- # Monitor Proxmox Resources Assumes `my_auth`, `my_base_url`, `my_curl` are set (see `auth-proxmox-api` skill). ## Relevant Endpoints ``` GET /api2/json/cluster/resources?type=vm # all VMs and CTs, instant metrics GET /api2/json/nodes/{node}/qemu/{vmid}/rrddata?timeframe=hour&cf=AVERAGE # VM RRD GET /api2/json/nodes/{node}/lxc/{vmid}/rrddata?timeframe=hour&cf=AVERAGE # CT RRD ``` ## Instant Snapshot (All VMs + CTs) `/cluster/resources` fields relevant to resource usage: | Field | Type | Notes | |-------|------|-------| | `vmid` | int | | | `type` | `qemu`\|`lxc` | | | `node` | string | | | `name` | string | | | `status` | `running`\|`stopped` | | | `cpu` | float 0–1 | fraction of allocated vCPUs currently used | | `mem` | bytes | actual RAM used — **see VM caveat below** | | `maxmem` | bytes | allocated RAM | | `disk` | bytes | rootfs used (CT) or primary disk used (VM, if guest agent) | | `maxdisk` | bytes | allocated disk | ### VM RAM Caveat: Guest Agent / Ballooning - **No guest agent installed**: `mem` is `0` or equals `maxmem`. The hypervisor cannot see inside the VM. The value is meaningless — do not display it as "used". - **Guest agent running with balloon driver**: `mem` reflects actual in-guest usage via the virtio-balloon interface. This is reliable. - **Detection heuristic**: if `mem == 0` or `mem == maxmem` for a running VM, assume no guest agent. Flag the row rather than display a misleading percentage. ```sh fn_vm_has_agent() { ( my_node="${1}"; my_vmid="${2}" # Returns 0 (success) if agent responded, 1 if not ${my_curl} -H "${my_auth}" \ "${my_base_url}/nodes/${my_node}/qemu/${my_vmid}/agent/info" \ > /dev/null 2>&1 ); } ``` ## Instant Resource Report (TSV) ```sh fn_resource_snapshot() { ( ${my_curl} -H "${my_auth}" "${my_base_url}/cluster/resources?type=vm" | jq -r ' .data[] | select(.status == "running") | (.mem / 1073741824 * 100 | round / 100) as $mem_gb | (.maxmem / 1073741824 * 100 | round / 100) as $maxmem_gb | (.disk / 1073741824 * 100 | round / 100) as $disk_gb | (.maxdisk / 1073741824 * 100 | round / 100) as $maxdisk_gb | (if .maxmem > 0 then (.mem / .maxmem * 100 | round) else 0 end) as $mem_pct | ( if .type == "qemu" and (.mem == 0 or .mem == .maxmem) then "no-agent" elif .type == "qemu" then "agent" else "n/a" end ) as $agent | [ .vmid, .type, .node, .name, .status, (.cpu * 100 | round / 100 | tostring), ($mem_gb | tostring), ($maxmem_gb | tostring), ($mem_pct | tostring), ($disk_gb | tostring), ($maxdisk_gb | tostring), $agent ] | @tsv ' ); } # With header fn_resource_report() { ( printf 'vmid\ttype\tnode\tname\tstatus\tcpu\tmem_gb\tmem_max_gb\tmem_pct\tdisk_gb\tdisk_max_gb\tagent\n' fn_resource_snapshot ); } ``` Example output (VMs without guest agent show `no-agent` — mem figures are unreliable): ``` vmid type node name status cpu mem_gb mem_max_gb mem_pct disk_gb disk_max_gb agent 100 qemu pve1 web1 running 0.12 2.1 4.0 52 15.3 50.0 agent 101 lxc pve1 db1 running 0.05 0.8 2.0 40 5.2 20.0 n/a 102 qemu pve1 win1 running 0.43 8.0 8.0 100 - 100.0 no-agent ``` ## 3-Sample Average (RRD) RRD data at `timeframe=hour` has ~1-minute resolution. Take the last 3 non-null samples and average them for a smoothed reading. ```sh fn_rrd_3avg() { ( my_node="${1}" my_type="${2}" # qemu or lxc my_vmid="${3}" ${my_curl} -H "${my_auth}" \ "${my_base_url}/nodes/${my_node}/${my_type}/${my_vmid}/rrddata?timeframe=hour&cf=AVERAGE" | jq ' [ .data[] | select(.cpu != null and .mem != null) ] | .[-3:] | { cpu: ([ .[].cpu ] | add / length), mem: ([ .[].mem ] | add / length), maxmem: ([ .[].maxmem ] | add / length) } ' ); } ``` Returns a JSON object with `cpu` (fraction), `mem` (bytes), `maxmem` (bytes) — each averaged over the last 3 available data points (~3 minutes). ## Cluster-Wide Summary (Pool or Node) ```sh fn_pool_resource_summary() { ( my_pool="${1:-}" my_filter='' if test -n "${my_pool}"; then my_filter="| select(.pool == \"${my_pool}\")" fi ${my_curl} -H "${my_auth}" "${my_base_url}/cluster/resources?type=vm" | jq -r --arg pool "${my_pool}" ' [ .data[] | select(.status == "running") | select(if $pool != "" then .pool == $pool else true end) ] | { count: length, cpu_total: ([ .[].maxcpu ] | add // 0), cpu_used: ([ .[].cpu ] | map(. * 1) | add // 0 | . * 100 | round / 100), mem_max_gb: ([ .[].maxmem ] | add // 0) / 1073741824 * 100 | round / 100, mem_used_gb: ([ .[].mem ] | add // 0) / 1073741824 * 100 | round / 100, disk_max_gb: ([ .[].maxdisk ] | add // 0) / 1073741824 * 100 | round / 100, disk_used_gb:([ .[].disk ] | add // 0) / 1073741824 * 100 | round / 100 } ' ); } ``` ## Notes - `cpu` from cluster resources is a point-in-time fraction (0–1). Multiply by 100 for percentage. It is already relative to allocated cores, not total host cores. - For VMs flagged `no-agent`, CPU *is* tracked by the hypervisor and is reliable. Only RAM and disk-used are unreliable without the guest agent. - QEMU guest agent package: `qemu-guest-agent` (Debian/Ubuntu) or `qemu-guest-agent` (Alpine). Must also be enabled in VM config: `agent=1`. - RRD data may have null entries at the tail if the VM was recently started.