-
Notifications
You must be signed in to change notification settings - Fork 29
Description
Is your feature request related to a problem?
AI agents (Claude Code, Cursor, Copilot, etc.) can already invoke pup commands, but they have no structured guidance on how to compose commands across domains to solve real Datadog workflows. An agent asked to "investigate why checkout latency spiked" has to independently discover that it should query metrics, then correlate with traces, then search logs, then check for recent deployments — with the right flags and query syntax for each step.
The result is that agents either under-utilize pup (running one command in isolation) or waste tokens fumbling through --help output to figure out multi-step workflows. The CLI has 283+ subcommands across 42 domains. That's a lot of surface area for an agent to navigate without guidance.
Describe the solution you'd like
Pup should ship with agent skills — structured workflow guides that teach AI agents how to compose pup commands for common Datadog use cases. These skills would provide:
- Workflow sequencing: Which commands to run in what order, and how the output of one feeds into the next
- Query syntax guidance: The right Datadog query patterns for each domain (log queries, metric queries, trace filters)
- Cross-domain correlation: How to connect data across monitors, logs, metrics, traces, and incidents
- Output handling: How to use
--outputflags and--agentmode effectively - Decision logic: When to branch the workflow (e.g., "if no results in indexes, try flex storage")
Proposed skill areas (7)
-
Incident Triage — Guide an agent through the full incident response loop: check firing monitors → search correlated logs → query relevant metrics → trace slow requests → create or update an incident with findings. The highest-value skill because it crosses the most domains and reflects the most urgent real-world use case.
-
Log Investigation — Teach agents to effectively search, aggregate, and correlate logs: construct queries with the right syntax (
status:error service:api @field:value), choose the right storage tier (indexes vs flex vs online-archives), use aggregation to find patterns (--compute="count" --group-by="service"), and pivot from log results to related traces or metrics. -
Performance Debugging — Walk agents through diagnosing latency and errors: query time-series metrics → search APM traces for slow spans → check error-tracking for new issues → correlate with RUM session data → map service dependencies. Includes guidance on time-range alignment across domains.
-
Monitor & Downtime Management — Guide agents through listing and filtering monitors by tags/status, understanding alert thresholds, creating scheduled downtimes for maintenance windows, and bulk operations across monitor groups.
-
Infrastructure Health Check — Teach agents to assess infrastructure state: list hosts with filters → review host tags and metadata → query system metrics (CPU, memory, disk) → check fleet agent status → identify hosts that are offline or under-reporting.
-
Security Posture Review — Guide agents through a security audit workflow: list active security signals → search findings by severity → review audit logs for suspicious activity → check security rule coverage → identify gaps in detection.
-
CI/CD Pipeline Health — Help agents analyze pipeline and test health: search recent pipeline runs → identify failing stages → find flaky tests → review DORA metrics (deployment frequency, lead time, failure rate) → correlate failures with code changes.
Describe alternatives you've considered
-
Relying on
--helpand docs alone: Agents can read help text, but it doesn't teach workflow composition. An agent knowspup logs searchexists but not that it should follow apup monitors list --status=alertcall during incident triage. -
A single monolithic SKILL.md: Possible but likely too large to be useful. Grouping skills into focused plugins/packages allows agents to load only the context relevant to the task at hand and keeps each skill concise enough to fit in a context window.
-
Embedding guidance in
--agentmode output: The agent output format already includesnext_actionhints, which is great. Skills would complement this by providing the higher-level "why" and workflow structure rather than just the next immediate step.
Additional context
This is inspired by Playwright CLI's SKILL.md, which ships a skill file that agents can install to learn how to drive the browser CLI effectively. Playwright uses a single skill, but given pup's breadth (42 domains), it would likely make more sense to group skills as separate plugins — perhaps by persona (SRE, security engineer, platform engineer) or by workflow category (observability, incident management, security).
Key design considerations:
- Skills should be installable via a CLI command so agents can self-provision them
- Skills could live in a
skills/directory in the repo, making them easy to contribute to - The format should be agent-framework-agnostic (works with Claude Code, Cursor, Copilot, etc.)
- Skills should reference real pup command syntax so they stay in sync with the CLI
The existing --agent mode (envelope format with status, data, metadata, next_action) already provides the low-level plumbing. Skills would be the high-level layer that tells agents what to do, while agent mode tells them how to parse the results.
Proposed command syntax (if applicable)
# Install all available skills
pup skills install
# List available skill plugins
pup skills list
# Install a specific skill plugin
pup skills install incident-triage
# Show where skills are installed
pup skills path