Problem
AegisFlow exposes Prometheus metrics for requests, latency, tokens, and errors, but there is no dedicated metric for policy decisions. Operators running this in production cannot easily graph the ratio of allow vs review vs block over time.
Acceptance criteria
Files to touch
internal/middleware/metrics.go (or a new file in the same package for policy metrics)
- Wherever the policy engine returns a decision that gets counted (
internal/toolpolicy, internal/mcpgw)
How to test locally
# Start AegisFlow, trigger one of each decision
aegisctl test-action --protocol mcp --tool github.list_repos --target foo
aegisctl test-action --protocol mcp --tool github.create_pr --target foo
aegisctl test-action --protocol mcp --tool github.delete_repo --target foo
# Check metrics
curl http://localhost:8081/metrics | grep policy_decisions_total
Notes
Cardinality matters. If the tool label becomes unbounded (for example, user-supplied names), consider dropping it or keeping it in a separate metric. A safe default is to include protocol and decision but omit tool from the labels and emit the tool name as a log line instead.
Problem
AegisFlow exposes Prometheus metrics for requests, latency, tokens, and errors, but there is no dedicated metric for policy decisions. Operators running this in production cannot easily graph the ratio of allow vs review vs block over time.
Acceptance criteria
aegisflow_policy_decisions_total{decision="allow|review|block", protocol="...", tool="..."}/metricson the admin portFiles to touch
internal/middleware/metrics.go(or a new file in the same package for policy metrics)internal/toolpolicy,internal/mcpgw)How to test locally
Notes
Cardinality matters. If the
toollabel becomes unbounded (for example, user-supplied names), consider dropping it or keeping it in a separate metric. A safe default is to includeprotocolanddecisionbut omittoolfrom the labels and emit the tool name as a log line instead.