Skip to content

[FEATURE] Ship Agent Skills with Pup CLI #148

@bi1yeu

Description

@bi1yeu

Is your feature request related to a problem?

AI agents (Claude Code, Cursor, Copilot, etc.) can already invoke pup commands, but they have no structured guidance on how to compose commands across domains to solve real Datadog workflows. An agent asked to "investigate why checkout latency spiked" has to independently discover that it should query metrics, then correlate with traces, then search logs, then check for recent deployments — with the right flags and query syntax for each step.

The result is that agents either under-utilize pup (running one command in isolation) or waste tokens fumbling through --help output to figure out multi-step workflows. The CLI has 283+ subcommands across 42 domains. That's a lot of surface area for an agent to navigate without guidance.

Describe the solution you'd like

Pup should ship with agent skills — structured workflow guides that teach AI agents how to compose pup commands for common Datadog use cases. These skills would provide:

  • Workflow sequencing: Which commands to run in what order, and how the output of one feeds into the next
  • Query syntax guidance: The right Datadog query patterns for each domain (log queries, metric queries, trace filters)
  • Cross-domain correlation: How to connect data across monitors, logs, metrics, traces, and incidents
  • Output handling: How to use --output flags and --agent mode effectively
  • Decision logic: When to branch the workflow (e.g., "if no results in indexes, try flex storage")

Proposed skill areas (7)

  1. Incident Triage — Guide an agent through the full incident response loop: check firing monitors → search correlated logs → query relevant metrics → trace slow requests → create or update an incident with findings. The highest-value skill because it crosses the most domains and reflects the most urgent real-world use case.

  2. Log Investigation — Teach agents to effectively search, aggregate, and correlate logs: construct queries with the right syntax (status:error service:api @field:value), choose the right storage tier (indexes vs flex vs online-archives), use aggregation to find patterns (--compute="count" --group-by="service"), and pivot from log results to related traces or metrics.

  3. Performance Debugging — Walk agents through diagnosing latency and errors: query time-series metrics → search APM traces for slow spans → check error-tracking for new issues → correlate with RUM session data → map service dependencies. Includes guidance on time-range alignment across domains.

  4. Monitor & Downtime Management — Guide agents through listing and filtering monitors by tags/status, understanding alert thresholds, creating scheduled downtimes for maintenance windows, and bulk operations across monitor groups.

  5. Infrastructure Health Check — Teach agents to assess infrastructure state: list hosts with filters → review host tags and metadata → query system metrics (CPU, memory, disk) → check fleet agent status → identify hosts that are offline or under-reporting.

  6. Security Posture Review — Guide agents through a security audit workflow: list active security signals → search findings by severity → review audit logs for suspicious activity → check security rule coverage → identify gaps in detection.

  7. CI/CD Pipeline Health — Help agents analyze pipeline and test health: search recent pipeline runs → identify failing stages → find flaky tests → review DORA metrics (deployment frequency, lead time, failure rate) → correlate failures with code changes.

Describe alternatives you've considered

  • Relying on --help and docs alone: Agents can read help text, but it doesn't teach workflow composition. An agent knows pup logs search exists but not that it should follow a pup monitors list --status=alert call during incident triage.

  • A single monolithic SKILL.md: Possible but likely too large to be useful. Grouping skills into focused plugins/packages allows agents to load only the context relevant to the task at hand and keeps each skill concise enough to fit in a context window.

  • Embedding guidance in --agent mode output: The agent output format already includes next_action hints, which is great. Skills would complement this by providing the higher-level "why" and workflow structure rather than just the next immediate step.

Additional context

This is inspired by Playwright CLI's SKILL.md, which ships a skill file that agents can install to learn how to drive the browser CLI effectively. Playwright uses a single skill, but given pup's breadth (42 domains), it would likely make more sense to group skills as separate plugins — perhaps by persona (SRE, security engineer, platform engineer) or by workflow category (observability, incident management, security).

Key design considerations:

  • Skills should be installable via a CLI command so agents can self-provision them
  • Skills could live in a skills/ directory in the repo, making them easy to contribute to
  • The format should be agent-framework-agnostic (works with Claude Code, Cursor, Copilot, etc.)
  • Skills should reference real pup command syntax so they stay in sync with the CLI

The existing --agent mode (envelope format with status, data, metadata, next_action) already provides the low-level plumbing. Skills would be the high-level layer that tells agents what to do, while agent mode tells them how to parse the results.

Proposed command syntax (if applicable)

# Install all available skills
pup skills install

# List available skill plugins
pup skills list

# Install a specific skill plugin
pup skills install incident-triage

# Show where skills are installed
pup skills path

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions