feat: Add metrics tracking by alexec · Pull Request #96 · kitproj/kit

alexec · 2025-09-14T20:40:47Z

Summary by CodeRabbit

New Features
- Introduced a two-pane, resizable interface: DAG viewer (left) and live logs (right).
- Real-time updates to node status, grouping, and re-parenting via events.
- Node selection updates header (name, metrics, message) and recenters the graph.
- Per-node status icons; dark-themed, ANSI-colored logs with links and line numbers.
- Displays per-task CPU and memory metrics with periodic updates while running.
- Auto-scroll toggle and follow control, click-to-scroll, and hover cues.
- Responsive layout with draggable resizer and window resize handling.

- Introduce `enableMetrics` boolean to workflow schema and docs - Track CPU and memory usage for tasks (host, container, k8s) - Show metrics in web UI and expose via API - Add types.Metrics struct and plumbing throughout codebase

Copilot

Pull Request Overview

This PR adds metrics collection functionality to track resource usage (CPU and memory) for tasks during workflow execution. The feature is controlled by a new enableMetrics option that defaults to enabled.

Introduces a new Metrics type to capture CPU and memory usage
Implements resource metrics collection for different process types (host, container, Kubernetes)
Adds metrics display in the web UI and HTTP API endpoint for accessing metrics

Reviewed Changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
schema/workflow.schema.json	Adds enableMetrics boolean field to workflow schema
internal/types/spec.go	Adds EnableMetrics field and getter method with default true value
internal/types/metrics.go	Defines new Metrics struct for CPU and memory tracking
internal/task_node.go	Adds Metrics field to TaskNode structure
internal/server.go	Implements HTTP endpoint to retrieve task metrics
internal/run.go	Adds periodic metrics collection goroutine for running tasks
internal/proc/proc.go	Adds GetMetrics method to process interface
internal/proc/noop.go	Implements GetMetrics returning empty metrics
internal/proc/kubernetes.go	Implements kubectl-based metrics collection for Kubernetes pods
internal/proc/host.go	Implements ps-based metrics collection for host processes
internal/proc/container.go	Implements Docker stats-based metrics collection for containers
internal/index.html	Updates UI to display metrics and improves code formatting
docs/reference/workflow.md	Documents new enableMetrics configuration option
docs/reference/workflow-defs-workflow.md	Adds enableMetrics to workflow definition documentation
docs/reference/workflow-defs-workflow-properties-enablemetrics.md	Adds detailed documentation for enableMetrics property

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

internal/proc/kubernetes.go

internal/proc/container.go

internal/run.go

internal/index.html

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Alex Collins <alexec@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (6)

internal/types/metrics.go (1)

4-5: Clarify CPU unit semantics (millicores, aggregated across cores).

Document that CPU is in millicores aggregated across all cores (1000m = 1 core; values may exceed 1000 on multi‑core) to keep backends/UI consistent.

Apply this diff:
-	CPU uint64 `json:"cpu"` // CPU usage in millicores
+	CPU uint64 `json:"cpu"` // CPU usage in millicores aggregated across cores (1000m = 1 core; can exceed 1000m)

internal/proc/kubernetes.go (4)

43-47: Track pods by namespace/name and guard with a mutex.

Using only names risks collisions across namespaces and introduces races. Store ns+name and protect with RWMutex.

 type k8s struct {
-	log      *log.Logger
-	spec     types.Spec
-	name     string
-	podNames []string
+	log  *log.Logger
+	spec types.Spec
+	name string
+	mu   sync.RWMutex
+	pods []struct{ ns, name string }
 	types.Task
 }

276-279: Deduplicate under lock using namespace/name; avoid races.

-		if !slices.Contains(k.podNames, pod.Name) {
-			k.podNames = append(k.podNames, pod.Name)
-		}
+		k.mu.Lock()
+		seen := false
+		for _, p := range k.pods {
+			if p.ns == pod.Namespace && p.name == pod.Name {
+				seen = true
+				break
+			}
+		}
+		if !seen {
+			k.pods = append(k.pods, struct{ ns, name string }{ns: pod.Namespace, name: pod.Name})
+		}
+		k.mu.Unlock()

435-446: Aggregate with snapshot under RLock; tolerate per‑pod errors.

One failing pod should not zero the whole read; also avoid iterating while mutating.

 func (k *k8s) GetMetrics(ctx context.Context) (*types.Metrics, error) {
-	sum := &types.Metrics{}
-	for _, podName := range k.podNames {
-		metrics, err := k.getMetrics(ctx, podName)
-		if err != nil {
-			return nil, err
-		}
-		sum.CPU += metrics.CPU
-		sum.Mem += metrics.Mem
-	}
-	return sum, nil
+	sum := &types.Metrics{}
+	k.mu.RLock()
+	pods := append([]struct{ ns, name string }{}, k.pods...) // snapshot
+	k.mu.RUnlock()
+	for _, p := range pods {
+		metrics, err := k.getMetrics(ctx, p.ns, p.name)
+		if err != nil {
+			k.log.Printf("kubectl top failed for %s/%s: %v", p.ns, p.name, err)
+			continue
+		}
+		sum.CPU += metrics.CPU
+		sum.Mem += metrics.Mem
+	}
+	return sum, nil
 }

448-456: Pass namespace to kubectl top.

Without -n, results are wrong outside the default namespace.

-func (k *k8s) getMetrics(ctx context.Context, podName string) (*types.Metrics, error) {
-	cmd := exec.CommandContext(ctx, "kubectl", "top", "pod", podName, "--no-headers")
+func (k *k8s) getMetrics(ctx context.Context, ns, podName string) (*types.Metrics, error) {
+	cmd := exec.CommandContext(ctx, "kubectl", "top", "pod", podName, "-n", ns, "--no-headers")

internal/proc/container.go (1)

335-339: Prefer inactive_file on cgroup v2 for more accurate RSS‑like memory.

Fallback to cache only if inactive_file is missing/zero.

-	memoryBytes := dockerStats.MemoryStats.Usage
-	if dockerStats.MemoryStats.Stats["cache"] != 0 {
-		memoryBytes -= dockerStats.MemoryStats.Stats["cache"]
-	}
+	memoryBytes := dockerStats.MemoryStats.Usage
+	if v := dockerStats.MemoryStats.Stats["inactive_file"]; v != 0 {
+		memoryBytes -= v
+	} else if v := dockerStats.MemoryStats.Stats["cache"]; v != 0 {
+		memoryBytes -= v
+	}

🧹 Nitpick comments (2)

internal/proc/kubernetes.go (1)
458-465: Handle multi-line output defensively.

kubectl may return multiple lines if the selector broadens; trim to the first non-empty line before parsing.
-	fields := strings.Fields(strings.TrimSpace(output))
+	lines := strings.Split(strings.TrimSpace(output), "\n")
+	if len(lines) == 0 {
+		return nil, fmt.Errorf("unexpected kubectl top output format")
+	}
+	fields := strings.Fields(lines[0])
internal/proc/container.go (1)

313-318: Consider reusing a Docker client for metrics polling.

Creating/closing a client per poll adds overhead. Cache a client on the struct or inject one; handle Close() on shutdown.

Would you like a follow‑up patch to add a lazy‑init client with a small mutex and lifecycle tied to Run/stop?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f42f28e and 01f282c.

📒 Files selected for processing (4)

internal/proc/container.go (5 hunks)
internal/proc/host.go (4 hunks)
internal/proc/kubernetes.go (4 hunks)
internal/types/metrics.go (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (3)

internal/proc/host.go (1)

internal/types/metrics.go (1)

Metrics (3-6)

internal/proc/container.go (1)

internal/types/metrics.go (1)

Metrics (3-6)

internal/proc/kubernetes.go (1)

internal/types/metrics.go (1)

Metrics (3-6)

🔇 Additional comments (2)

internal/proc/container.go (2)
178-179: Persisting containerID is correct.

341-350: CPU millicores conversion is off by 100× (inflated values).

cpuPercent here is in percent of a single core (100% = 1 core). To get millicores, multiply by 10, not 1000.
-			cpuPercent := (float64(cpuDelta) / float64(systemDelta)) * float64(len(dockerStats.CPUStats.CPUUsage.PercpuUsage))
-			cpuMillicores = uint64(cpuPercent * 1000) // Convert to millicores
+			cpuPercent := (float64(cpuDelta) / float64(systemDelta)) * float64(len(dockerStats.CPUStats.CPUUsage.PercpuUsage)) * 100.0
+			cpuMillicores = uint64(cpuPercent * 10) // 100% = 1000m
Likely an incorrect or invalid review comment.

internal/proc/host.go

Copilot

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-14T21:05:23Z

internal/proc/host.go

+		return nil, fmt.Errorf("failed to parse CPU usage '%s': %w", fields[0], err)
+	}
+
+	cpuMillicores := cpuPercentage * 10 // Convert percentage to millicores (1% = 10 millicores)


The CPU percentage to millicores conversion is incorrect. 1% CPU usage should equal 10 millicores, but this assumes a single-core system. On multi-core systems, CPU percentage can exceed 100%. The conversion should account for the number of CPU cores or use a different approach to get absolute CPU usage.

internal/index.html

Copilot

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

internal/proc/kubernetes.go

internal/proc/container.go

alexec added 2 commits September 14, 2025 13:26

Reformat index.html with consistent indentation and line wrapping

bdebdc8

alexec requested a review from Copilot September 14, 2025 20:40

This comment was marked as outdated.

Sign in to view

alexec marked this pull request as ready for review September 14, 2025 20:41

Copilot AI reviewed Sep 14, 2025

View reviewed changes

internal/proc/kubernetes.go Outdated Show resolved Hide resolved

internal/proc/container.go Show resolved Hide resolved

internal/run.go Show resolved Hide resolved

internal/index.html Outdated Show resolved Hide resolved

alexec and others added 3 commits September 14, 2025 13:42

Remove enableMetrics option and related metrics endpoint

848970b

Update internal/proc/kubernetes.go

f42f28e

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Alex Collins <alexec@users.noreply.github.com>

Report CPU usage in millicores instead of percentage

01f282c

This comment was marked as outdated.

Sign in to view

coderabbitai bot reviewed Sep 14, 2025

View reviewed changes

internal/proc/host.go Show resolved Hide resolved

alexec added 2 commits September 14, 2025 14:02

Track pods as namespace/name and update metrics collection

0e76f23

Fix missing podKey append in Run method

d221075

alexec changed the title ~~feat: Add enableMetrics option and resource metrics tracking~~ feat: Add metrics tracking Sep 14, 2025

alexec requested a review from Copilot September 14, 2025 21:04

Copilot AI reviewed Sep 14, 2025

View reviewed changes

Fix CPU metric formatting to use raw value in m units

a22b6ac

alexec requested a review from Copilot September 14, 2025 21:13

Copilot AI reviewed Sep 14, 2025

View reviewed changes

internal/proc/kubernetes.go Show resolved Hide resolved

internal/proc/container.go Show resolved Hide resolved

alexec merged commit cdb8753 into main Sep 15, 2025
2 checks passed

alexec deleted the metrics branch September 15, 2025 00:01

coderabbitai bot mentioned this pull request Sep 19, 2025

feat: change to us exec+ps for metrics #97

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add metrics tracking#96

feat: Add metrics tracking#96
alexec merged 8 commits intomainfrom
metrics

alexec commented Sep 14, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

This comment was marked as outdated.

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Sep 14, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alexec commented Sep 14, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

This comment was marked as outdated.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexec commented Sep 14, 2025 •

edited by coderabbitai bot

Loading