Skip to content

feat: change to us exec+ps for metrics#97

Merged
alexec merged 9 commits intomainfrom
metrics.1
Sep 19, 2025
Merged

feat: change to us exec+ps for metrics#97
alexec merged 9 commits intomainfrom
metrics.1

Conversation

@alexec
Copy link
Collaborator

@alexec alexec commented Sep 18, 2025

Summary by CodeRabbit

  • New Features
    • Metrics UI now shows memory usage only; CPU values have been removed.
  • Documentation
    • Quick Start updated to reference release v0.1.104.
  • Refactor
    • Metrics collection unified across local, Docker, and Kubernetes to standardize memory reporting.
  • Chores
    • Default host port for the Kubernetes run task changed to 18080 → 80 (container).

@coderabbitai
Copy link

coderabbitai bot commented Sep 18, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Removes CPU from metrics and UI, adds an internal/metrics package (ps and procfs helpers), and refactors host, container, and Kubernetes metric collection to exec-based ps/procfs parsing. Also bumps README install version and updates a tasks port mapping.

Changes

Cohort / File(s) Summary
Docs version bump
README.md
Update install/download references from v0.1.103 to v0.1.104 (binary asset and go install).
UI metrics display
internal/index.html
Simplify formatMetrics to show only memory (Mi); CPU display removed.
Types: metrics shape
internal/types/metrics.go
Remove CPU field from Metrics struct; retain only Mem.
Metrics helpers (new)
internal/metrics/ps.go, internal/metrics/procfs.go
Add GetPSCommand/ParsePSOutput and GetProcFSCommand/ParseProcFSOutput to produce/parse ps and /proc/[pid]/statm outputs; include lazy page-size detection and parsing helpers.
Proc: host collection
internal/proc/host.go
Replace ad-hoc ps execution/parsing with metrics.GetPSCommand + metrics.ParsePSOutput.
Proc: container collection
internal/proc/container.go
Replace Docker stats API with lazy Docker client + docker exec of procfs command (PID 1); add execInContainer helper and parse via metrics.ParseProcFSOutput.
Proc: Kubernetes collection
internal/proc/kubernetes.go
Replace kubectl top parsing with per-container kubectl exec of procfs command; enumerate containers and aggregate memory via metrics.ParseProcFSOutput; remove CPU accumulation.
Tasks config
tasks.yaml
Change run-k8s port mapping from 80:8080 to 18080:80.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant UI as Web UI
  participant Proc as internal/proc
  participant Exec as OS / Docker / kubectl
  participant Metrics as internal/metrics
  Note over UI,Proc: Memory-only metrics flow

  User->>UI: Select target
  UI->>Proc: GetMetrics()
  alt Host
    Proc->>Metrics: GetPSCommand(pid)
    Proc->>Exec: exec ps ...
    Exec-->>Proc: ps output
    Proc->>Metrics: ParsePSOutput(output)
  else Container
    Proc->>Metrics: GetProcFSCommand(1)
    Proc->>Exec: docker exec <container> cat /proc/1/statm
    Exec-->>Proc: /proc statm output
    Proc->>Metrics: ParseProcFSOutput(output)
  else Kubernetes
    Proc->>Metrics: GetProcFSCommand(1)
    Proc->>Exec: kubectl exec <ns>/<pod> -- cat /proc/1/statm
    Exec-->>Proc: /proc statm output
    Proc->>Metrics: ParseProcFSOutput(output)
  end
  Metrics-->>Proc: Metrics{ Mem bytes }
  Proc-->>UI: Mem bytes
  UI-->>User: Display Mem (Mi)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • feat: Add metrics tracking #96 — Prior metrics refactor touching internal/types/metrics.go and internal/proc/*; strongly related to the introduced metrics helpers and CPU removal.

Poem

I nibble logs with whiskered grace,
RSS crumbs I now embrace—
No more CPU to chase or fret,
Just mem in bytes, my favorite set.
Hop, hop! Through /proc I peek, delight—
Mem in Mi, and all feels right. 🐰✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title succinctly describes the primary change—switching metrics collection to an exec+ps (procfs/ps) approach—and matches the changes across internal/metrics, container, host, and kubernetes code paths in the diff. It is specific and relevant enough for a reviewer to understand the main intent of the PR. The only minor issue is a likely typo ("us" → "use") but it does not obscure the meaning.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch metrics.1

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b510437 and c49ead2.

📒 Files selected for processing (2)
  • internal/proc/container.go (2 hunks)
  • internal/proc/kubernetes.go (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
internal/proc/kubernetes.go (3)
internal/proc/proc.go (1)
  • Interface (11-16)
internal/types/metrics.go (1)
  • Metrics (3-5)
internal/metrics/procfs.go (2)
  • GetProcFSCommand (69-71)
  • ParseProcFSOutput (74-91)
internal/proc/container.go (1)
internal/metrics/procfs.go (2)
  • GetProcFSCommand (69-71)
  • ParseProcFSOutput (74-91)
🔇 Additional comments (10)
internal/proc/kubernetes.go (7)

21-21: LGTM - Importing the new metrics package.

The new import correctly adds the internal/metrics package which provides standardized metric collection utilities.


32-37: LGTM - Adding required Kubernetes API imports.

The new imports (scheme, rest, remotecommand) are necessary for the exec-based container metrics collection and SPDY executor functionality.


44-51: LGTM - Adding clientset and REST config storage.

These fields are correctly added to support the new exec-based metrics collection approach that requires direct Kubernetes API access.


116-119: LGTM - Storing API clients for later metrics use.

The clientset and REST config are properly stored for use in the new exec-based metrics collection implementation.


444-456: LGTM - Simplified GetMetrics with better error context.

The refactored GetMetrics method correctly:

  • Accumulates memory metrics across all pods
  • Provides better error context by including the pod key
  • Maintains the same public interface

482-499: LGTM - Proper container enumeration using Kubernetes API.

The implementation correctly:

  • Uses the stored clientset to fetch pod details
  • Extracts container names from pod.Spec.Containers
  • Provides meaningful error messages with pod context
  • Handles the case of pods with no containers

501-515: LGTM - Container metrics collection using procfs.

The implementation correctly:

  • Uses metrics.GetProcFSCommand(1) for PID 1 (standard container init process)
  • Delegates command execution to execInContainer
  • Uses metrics.ParseProcFSOutput for consistent parsing
  • Provides container-specific error context
internal/proc/container.go (3)

28-28: LGTM - Adding metrics package import.

The import correctly adds the internal/metrics package for standardized metric collection.


34-38: LGTM - Adding Docker API client field.

The dockerCli field is properly added to support the new exec-based metrics collection approach.


310-333: LGTM - Refactored GetMetrics using procfs approach.

The new implementation correctly:

  • Lazy initializes the Docker client when needed
  • Uses metrics.GetProcFSCommand(1) for consistent PID 1 metrics
  • Delegates execution to execInContainer helper
  • Uses metrics.ParseProcFSOutput for consistent parsing
  • Provides container-specific error context

This aligns with the standardized metrics approach used across host and Kubernetes paths.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alexec alexec requested a review from Copilot September 19, 2025 03:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors metrics collection to use exec+ps and /proc filesystem instead of kubectl top commands. The changes simplify CPU metrics handling by removing CPU tracking entirely and focusing solely on memory usage.

  • Replaces kubectl top with direct process filesystem access for Kubernetes metrics
  • Switches from Docker stats API to docker exec with procfs for container metrics
  • Removes CPU metrics collection across all platforms, keeping only memory tracking

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tasks.yaml Updates port mapping from 80:8080 to 18080:80
internal/types/metrics.go Removes CPU field from Metrics struct
internal/proc/kubernetes.go Replaces kubectl top with kubectl exec + procfs commands
internal/proc/host.go Switches from ps CPU parsing to procfs-based metrics
internal/proc/container.go Replaces Docker stats API with docker exec + procfs
internal/metrics/ps.go New module for parsing ps command output (memory only)
internal/metrics/procfs.go New module for parsing /proc/pid/stat output
internal/index.html Updates UI to display only memory metrics
README.md Version bump from v0.1.103 to v0.1.104

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@alexec alexec requested a review from Copilot September 19, 2025 03:43
@alexec alexec marked this pull request as ready for review September 19, 2025 03:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/proc/kubernetes.go (1)

436-449: Fix data race on k.pods and don't abort GetMetrics on a single pod error

  • internal/proc/kubernetes.go — GetMetrics: snapshot k.pods under a read lock, validate pod keys, log and continue on per-pod errors, return partial metrics plus an aggregated error if any.
  • internal/proc/kubernetes.go — k8s struct/processPod: add podsMu sync.RWMutex; guard writes to k.pods with podsMu.Lock()/Unlock() and reads with RLock()/RUnlock().

Recommended diff (unchanged from original suggestion):

 func (k *k8s) GetMetrics(ctx context.Context) (*types.Metrics, error) {
-  sum := &types.Metrics{}
-  for _, podKey := range k.pods {
+  sum := &types.Metrics{}
+  k.podsMu.RLock()
+  pods := append([]string(nil), k.pods...)
+  k.podsMu.RUnlock()
+  var errs []string
+  for _, podKey := range pods {
     parts := strings.SplitN(podKey, "/", 2)
-    namespace := parts[0]
-    podName := parts[1]
-    metrics, err := k.getMetrics(ctx, namespace, podName)
-    if err != nil {
-      return nil, fmt.Errorf("failed to get metrics for pod %s: %w", podKey, err)
-    }
-    sum.Mem += metrics.Mem
+    if len(parts) != 2 {
+      k.log.Printf("metrics: malformed pod key %q", podKey)
+      continue
+    }
+    namespace, podName := parts[0], parts[1]
+    m, err := k.getMetrics(ctx, namespace, podName)
+    if err != nil {
+      k.log.Printf("metrics: %s: %v", podKey, err)
+      errs = append(errs, podKey)
+      continue
+    }
+    sum.Mem += m.Mem
   }
-  return sum, nil
+  if len(errs) > 0 {
+    return sum, fmt.Errorf("partial metrics; %d pod(s) failed: %s", len(errs), strings.Join(errs, ", "))
+  }
+  return sum, nil
 }
 type k8s struct {
   log  *log.Logger
   spec types.Spec
   name string
   pods []string // namespace/name
+  podsMu sync.RWMutex
   types.Task
 }

Also wrap the append in processPod with k.podsMu.Lock()/Unlock() (and use RLock/RUnlock in readers).

♻️ Duplicate comments (2)
internal/metrics/procfs.go (1)

29-30: Page size assumption may cause incorrect memory calculations on non-x86 systems

The hardcoded page size of 4096 bytes may not be accurate for all systems. Some architectures use different page sizes (e.g., ARM systems can use 16KB or 64KB pages), which would lead to incorrect memory calculations.

Consider using syscall.Getpagesize() or reading from /proc/self/status to get the actual page size:

+import (
+	"syscall"
+)
+
 // ParseProcFSOutput parses /proc/pid/stat output for memory usage only
 func ParseProcFSOutput(output string) (*types.Metrics, error) {
 	fields := strings.Fields(strings.TrimSpace(output))
 	if len(fields) < 24 {
 		return nil, fmt.Errorf("unexpected /proc/pid/stat output: insufficient fields")
 	}
 
 	// Field 23 (0-indexed): RSS in pages
 	rssPages, err := strconv.ParseInt(fields[23], 10, 64)
 	if err != nil {
 		return nil, fmt.Errorf("failed to parse RSS from /proc/pid/stat: %w", err)
 	}
 
-	// Convert pages to bytes (assuming 4KB page size)
-	memoryBytes := uint64(rssPages * 4096)
+	// Convert pages to bytes using system page size
+	pageSize := syscall.Getpagesize()
+	memoryBytes := uint64(rssPages) * uint64(pageSize)
 
 	return &types.Metrics{Mem: memoryBytes}, nil
 }
internal/proc/kubernetes.go (1)

452-455: kubectl exec arg construction is now correct; resolves prior concern.

Using append(..., command...) after “--” is the right way to pass command and args.

🧹 Nitpick comments (4)
internal/metrics/ps.go (1)

15-40: Consider adding bounds check for RSS memory values

While the parsing logic handles errors gracefully, extremely large RSS values could potentially overflow when converted to bytes. Consider adding a sanity check to prevent issues with corrupted or unexpected ps output.

 func ParsePSOutput(output string) (*types.Metrics, error) {
 	lines := strings.Split(strings.TrimSpace(output), "\n")
 	if len(lines) < 2 {
 		return nil, fmt.Errorf("unexpected ps output: %s", output)
 	}
 
 	var totalMemory uint64
 
 	for i := 1; i < len(lines); i++ {
 		fields := strings.Fields(lines[i])
 		if len(fields) < 1 {
 			continue
 		}
 
 		rssMemoryKB, err := strconv.ParseInt(fields[0], 10, 64)
 		if err != nil {
 			continue
 		}
+
+		// Sanity check: RSS shouldn't exceed reasonable limits (e.g., 1TB)
+		const maxReasonableMemoryKB = 1024 * 1024 * 1024 // 1TB in KB
+		if rssMemoryKB < 0 || rssMemoryKB > maxReasonableMemoryKB {
+			continue
+		}
 
 		totalMemory += uint64(rssMemoryKB * 1024)
 	}
 
 	return &types.Metrics{
 		Mem: totalMemory,
 	}, nil
 }
internal/proc/container.go (1)

318-327: Consider more descriptive error messages

The error messages could be more specific about what type of command failed.

 	output, err := cmd.Output()
 	if err != nil {
-		return nil, fmt.Errorf("docker exec ps failed for container %s: %w", c.name, err)
+		return nil, fmt.Errorf("docker exec failed to read /proc/1/stat for container %s: %w", c.name, err)
 	}
 
 	metrics, err := metrics.ParseProcFSOutput(string(output))
 	if err != nil {
-		return nil, fmt.Errorf("failed to parse process metrics for container %s: %w", c.name, err)
+		return nil, fmt.Errorf("failed to parse /proc/1/stat output for container %s: %w", c.name, err)
 	}
internal/proc/kubernetes.go (2)

457-457: Capture stderr for better diagnostics.

Use CombinedOutput so errors include pod/container stderr.

- output, err := cmd.Output()
+ output, err := cmd.CombinedOutput()

Also surface it in the error:

- return nil, fmt.Errorf("failed to run %q: %w", strings.Join(cmd.Args, " "), err)
+ return nil, fmt.Errorf("failed to run %q: %w; output: %s", strings.Join(cmd.Args, " "), err, strings.TrimSpace(string(output)))

460-465: Avoid shadowing the imported metrics package.

Rename the local variable to prevent confusion.

- metrics, err := metrics.ParseProcFSOutput(string(output))
+ m, err := metrics.ParseProcFSOutput(string(output))
   if err != nil {
-    return nil, fmt.Errorf("failed to parse process metrics for pid: %w", err)
+    return nil, fmt.Errorf("failed to parse process metrics for pid 1: %w", err)
   }
-  return metrics, nil
+  return m, nil
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9913016 and 88f0083.

📒 Files selected for processing (9)
  • README.md (1 hunks)
  • internal/index.html (1 hunks)
  • internal/metrics/procfs.go (1 hunks)
  • internal/metrics/ps.go (1 hunks)
  • internal/proc/container.go (3 hunks)
  • internal/proc/host.go (2 hunks)
  • internal/proc/kubernetes.go (2 hunks)
  • internal/types/metrics.go (0 hunks)
  • tasks.yaml (1 hunks)
💤 Files with no reviewable changes (1)
  • internal/types/metrics.go
🧰 Additional context used
🧬 Code graph analysis (5)
internal/metrics/procfs.go (1)
internal/types/metrics.go (1)
  • Metrics (3-5)
internal/metrics/ps.go (1)
internal/types/metrics.go (1)
  • Metrics (3-5)
internal/proc/host.go (1)
internal/metrics/ps.go (2)
  • GetPSCommand (11-13)
  • ParsePSOutput (15-40)
internal/proc/container.go (1)
internal/metrics/procfs.go (2)
  • GetProcFSCommand (12-14)
  • ParseProcFSOutput (17-33)
internal/proc/kubernetes.go (2)
internal/types/metrics.go (1)
  • Metrics (3-5)
internal/metrics/procfs.go (2)
  • GetProcFSCommand (12-14)
  • ParseProcFSOutput (17-33)
🔇 Additional comments (7)
README.md (1)

38-39: LGTM! Version bump to v0.1.104

The version update is consistent across all installation methods (binary download and Go install).

Also applies to: 42-42

internal/index.html (1)

506-510: LGTM! CPU metrics removed from UI display

The formatMetrics function now correctly displays only memory metrics, aligning with the broader PR objective to remove CPU metrics from the codebase.

internal/proc/host.go (1)

93-101: LGTM! Clean delegation to metrics package

The refactoring properly delegates PS command construction and output parsing to the metrics package, improving code organization and maintainability.

internal/proc/container.go (1)

315-317: LGTM! Correct procfs-based metrics collection via docker exec

The command construction properly appends the procfs command elements to the docker exec arguments, which will correctly execute cat /proc/1/stat inside the container.

tasks.yaml (1)

45-45: Port mapping change may break existing workflows

run-k8s now maps host 18080→container 80 (tasks.yaml:45) but README still documents 80:8080 and PORT=8080 (README.md:57,139,158); run-container maps host 80→container 10080 (tasks.yaml:39). Confirm 18080:80 is intentional and either update README/docs/examples/tests and any dependent configs or revert to the previous host-port mapping.

internal/proc/kubernetes.go (2)

22-22: Import looks good.

New metrics package import is appropriate.


460-463: Confirm procfs parsing handles spaces in comm (field 2).

metrics.ParseProcFSOutput uses strings.Fields; /proc//stat’s comm can contain spaces inside parentheses. This can shift indexes and break RSS parsing. Consider parsing up to the last ')' before splitting.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/proc/kubernetes.go (1)

436-449: Potential data race on k.pods

k.pods is mutated in informer goroutines and read here without synchronization. Protect with a mutex and iterate over a copy.

 type k8s struct {
   log  *log.Logger
   spec types.Spec
   name string
-  pods []string // namespace/name
+  pods []string // namespace/name
+  podsMu sync.RWMutex
   types.Task
 }

Within processPod (where you append to k.pods):

- if !slices.Contains(k.pods, podKey) {
-   k.pods = append(k.pods, podKey)
- }
+ k.podsMu.Lock()
+ if !slices.Contains(k.pods, podKey) {
+   k.pods = append(k.pods, podKey)
+ }
+ k.podsMu.Unlock()

Here in GetMetrics:

- for _, podKey := range k.pods {
+ k.podsMu.RLock()
+ pods := append([]string(nil), k.pods...)
+ k.podsMu.RUnlock()
+ for _, podKey := range pods {
♻️ Duplicate comments (1)
internal/proc/container.go (1)

315-318: docker exec args are constructed correctly

Appending the command slice after the container name is the right tokenization for exec.Command. No “--” needed for docker exec. Resolves the earlier concern.

🧹 Nitpick comments (7)
internal/proc/container.go (2)

320-321: Fix inaccurate error text and include the invoked command

Message says “ps” but we’re running cat /proc/...; also include cmd for faster triage.

-    return nil, fmt.Errorf("docker exec ps failed for container %s: %w, output: %s", c.name, err, string(output))
+    return nil, fmt.Errorf("docker exec failed for container %s: %w, output: %s, cmd: %s",
+        c.name, err, string(output), cmd.String())

323-328: Avoid shadowing the metrics package identifier

Shadowing harms readability and future edits within this scope.

-    metrics, err := metrics.ParseProcFSOutput(string(output))
+    m, err := metrics.ParseProcFSOutput(string(output))
     if err != nil {
         return nil, fmt.Errorf("failed to parse process metrics for container %s: %w", c.name, err)
     }
-    return metrics, nil
+    return m, nil
internal/proc/kubernetes.go (3)

451-472: Serial per‑container exec can be slow; consider bounded parallelism

For pods with many containers, run execs with a small worker pool to reduce tail latency.


474-488: Include init/ephemeral containers and dedupe names

Current jsonpath returns only .spec.containers. Include init and ephemeral, then dedupe to avoid repeats.

-func (k *k8s) getContainersInPod(ctx context.Context, namespace, podName string) ([]string, error) {
-    cmd := exec.CommandContext(ctx, "kubectl", "get", "pod", "-n", namespace, podName, "-o", "jsonpath={.spec.containers[*].name}")
+func (k *k8s) getContainersInPod(ctx context.Context, namespace, podName string) ([]string, error) {
+    jp := "{.spec.initContainers[*].name} {.spec.containers[*].name} {.spec.ephemeralContainers[*].name}"
+    cmd := exec.CommandContext(ctx, "kubectl", "get", "pod", "-n", namespace, podName, "-o", "jsonpath="+jp)
     output, err := cmd.CombinedOutput()
     if err != nil {
-        return nil, fmt.Errorf("failed to get pod containers: %w, output: %s", err, string(output))
+        return nil, fmt.Errorf("failed to get pod containers: %w, output: %s, cmd: %s", err, string(output), cmd.String())
     }
-    containerNames := strings.Fields(strings.TrimSpace(string(output)))
+    raw := strings.Fields(strings.TrimSpace(string(output)))
+    seen := make(map[string]struct{}, len(raw))
+    containerNames := make([]string, 0, len(raw))
+    for _, name := range raw {
+        if _, ok := seen[name]; ok {
+            continue
+        }
+        seen[name] = struct{}{}
+        containerNames = append(containerNames, name)
+    }
     if len(containerNames) == 0 {
         return nil, fmt.Errorf("no containers found in pod %s/%s", namespace, podName)
     }
     return containerNames, nil
 }

490-504: Avoid shadowing the metrics package identifier

Rename the local variable for clarity.

-    metrics, err := metrics.ParseProcFSOutput(string(output))
+    m, err := metrics.ParseProcFSOutput(string(output))
     if err != nil {
         return nil, fmt.Errorf("failed to parse process metrics for container %s: %w", containerName, err)
     }
-    return metrics, nil
+    return m, nil
internal/metrics/procfs.go (2)

27-66: Simplify and harden page size detection

  • The “cmd != nil” check is redundant.
  • The meminfo/CPU checks don’t influence result; they add noise.
  • Prefer os.Getpagesize() as the local fallback.

This still won’t solve cross‑node discrepancies (see next comment), but it reduces complexity locally.

-func getPageSizeFromSystem() int64 {
-    // Method 1: Try using getconf command
-    if cmd := exec.Command("getconf", "PAGESIZE"); cmd != nil {
-        if output, err := cmd.Output(); err == nil {
-            if size, err := strconv.ParseInt(strings.TrimSpace(string(output)), 10, 64); err == nil && size > 0 {
-                return size
-            }
-        }
-    }
-    // Method 2: Try reading /proc/meminfo ...
-    ...
-    // Fallback: Use standard 4KB page size (most common)
-    return 4096
-}
+func getPageSizeFromSystem() int64 {
+    if output, err := exec.Command("getconf", "PAGESIZE").Output(); err == nil {
+        if size, err := strconv.ParseInt(strings.TrimSpace(string(output)), 10, 64); err == nil && size > 0 {
+            return size
+        }
+    }
+    // Local fallback
+    return int64(os.Getpagesize())
+}

73-91: Major correctness gap: remote page size may differ

ParseProcFSOutput multiplies RSS pages by the page size of the host running this binary. For Kubernetes, statm is from a remote node whose page size can be 16KB/64KB, producing incorrect bytes on non‑4KB nodes.

Prefer computing bytes in the target environment or using cgroup metrics:

  • Option A (recommended): Read cgroup bytes if present; fallback to statm×PAGESIZE inside the container.
  • Option B: Extend the command to emit bytes (rss_pages * getconf PAGESIZE) and parse that single integer here.

Minimal change via Option B:

-// GetProcFSCommand returns the cat command to read /proc/pid/statm
-func GetProcFSCommand(pid int) []string {
-    return []string{"cat", fmt.Sprintf("/proc/%d/statm", pid)}
-}
+// GetProcFSMemBytesCommand returns a shell that prints resident bytes for pid.
+// Tries cgroup v2 and v1 first, then falls back to statm*pagesize.
+func GetProcFSMemBytesCommand(pid int) []string {
+    script := fmt.Sprintf(`set -e
+if [ -r /sys/fs/cgroup/memory.current ]; then
+  cat /sys/fs/cgroup/memory.current
+elif [ -r /sys/fs/cgroup/memory/memory.usage_in_bytes ]; then
+  cat /sys/fs/cgroup/memory/memory.usage_in_bytes
+else
+  ps=$(getconf PAGESIZE 2>/dev/null || echo 4096)
+  set -- $(cat /proc/%d/statm)
+  echo $(($2 * ps))
+fi`, pid)
+    return []string{"sh", "-c", script}
+}

Then change callers (container/kubernetes) to use GetProcFSMemBytesCommand and update this parser to parse a single uint64:

-func ParseProcFSOutput(output string) (*types.Metrics, error) {
-    fields := strings.Fields(strings.TrimSpace(output))
-    if len(fields) < 2 {
-        return nil, fmt.Errorf("unexpected /proc/pid/statm output: insufficient fields")
-    }
-    rssPages, err := strconv.ParseInt(fields[1], 10, 64)
-    ...
-    return &types.Metrics{Mem: memoryBytes}, nil
-}
+func ParseProcFSOutput(output string) (*types.Metrics, error) {
+    s := strings.TrimSpace(output)
+    bytes, err := strconv.ParseUint(s, 10, 64)
+    if err != nil {
+        return nil, fmt.Errorf("failed to parse memory bytes: %w", err)
+    }
+    return &types.Metrics{Mem: bytes}, nil
+}

If distroless images are expected, gate this behind capability detection and return a clear “unsupported” error rather than noisy logs.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88f0083 and b510437.

📒 Files selected for processing (4)
  • internal/metrics/procfs.go (1 hunks)
  • internal/proc/container.go (3 hunks)
  • internal/proc/host.go (2 hunks)
  • internal/proc/kubernetes.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/proc/host.go
🧰 Additional context used
🧬 Code graph analysis (3)
internal/proc/kubernetes.go (2)
internal/types/metrics.go (1)
  • Metrics (3-5)
internal/metrics/procfs.go (2)
  • GetProcFSCommand (69-71)
  • ParseProcFSOutput (74-91)
internal/metrics/procfs.go (1)
internal/types/metrics.go (1)
  • Metrics (3-5)
internal/proc/container.go (1)
internal/metrics/procfs.go (2)
  • GetProcFSCommand (69-71)
  • ParseProcFSOutput (74-91)
🔇 Additional comments (2)
internal/proc/container.go (1)

315-321: Heads‑up: fails on distroless/minimal images

This path requires “cat” in the container. If you expect distroless images, consider a cgroup-based read (memory.current) first, with statm as fallback, or gate metrics per container image policy.

internal/metrics/procfs.go (1)

68-71: Note on container binaries

Both “cat” and “sh” may be absent in minimal images. If you adopt the shell approach, document this requirement or add a preflight check and skip metrics for such containers.

@alexec alexec requested a review from Copilot September 19, 2025 04:21
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@alexec alexec requested a review from Copilot September 19, 2025 04:27
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@alexec alexec merged commit 9d3fdc2 into main Sep 19, 2025
1 of 2 checks passed
@alexec alexec deleted the metrics.1 branch September 19, 2025 04:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants