Skip to content

feat: argus cluster — multi-JVM aggregated view across pods/instances #134

@rlaope

Description

@rlaope

Priority: P2

Perspective: Microservices / Distributed Systems Developer

Why

"I have 20 pods running the same service. I want to see which pod has the worst GC, which is leaking memory, and compare them side-by-side."

Design

# Discover all Argus-enabled pods via K8s API or mDNS
argus cluster scan --namespace=production

# Aggregated health view
argus cluster health
╭─ Cluster Health ────────────────────────────────────────────╮
│  Namespace: production    Pods: 20/20 healthy               │
│                                                              │
│  Pod              Heap%   GC OH   CPU    Leak?   VThreads   │
│  order-svc-abc    72%     2.1%    34%    No      1,234      │
│  order-svc-def    89%     8.3%    67%    ⚠ Yes   2,456      │
│  order-svc-ghi    45%     0.8%    12%    No      890        │
│  ...                                                         │
╰──────────────────────────────────────────────────────────────╯

# Compare two specific pods
argus cluster compare order-svc-abc order-svc-def

Implementation

  • K8s API: list pods with argus.io/enabled=true label, query each pod's /prometheus endpoint
  • Non-K8s: manual target list via config file or mDNS discovery
  • Aggregate metrics: min/max/avg/p99 across pods

Impact

Fills a major gap — no JVM CLI tool offers multi-instance aggregated diagnostics. This is the "production-scale" story.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions