Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions components/egress/METRIC.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Egress Sidecar Metrics

This document describes the Prometheus metrics exposed by the Egress Sidecar: name, type, description, and optional labels.
All metrics use the prefix `opensandbox_egress_*` and are exposed via HTTP at `GET /metrics` (same port as the policy server, default `:18080`).

---

## 1. DNS Proxy (Layer 1)

| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `opensandbox_egress_dns_queries_total` | Counter | Total DNS queries handled by the proxy, by result. | `result`: `allowed` (policy allowed and forward succeeded), `denied` (policy denied, NXDOMAIN returned), `forward_error` (policy allowed but upstream DNS failed). |
| `opensandbox_egress_dns_forward_duration_seconds` | Histogram / Summary | Latency in seconds of forwarding DNS queries to upstream. | For Summary, `quantile`; for Histogram, default buckets. |

---

## 2. Policy and Runtime

| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `opensandbox_egress_policy_updates_total` | Counter | Number of successful policy updates via `POST /policy`. | None. |
| `opensandbox_egress_policy_rule_count` | Gauge | Current number of egress rules in the active policy. | Optional: `default_action` (`allow` / `deny`). |
| `opensandbox_egress_enforcement_mode` | Gauge | Current enforcement mode for observability (OSEP R6). Value is 1; label distinguishes mode. | `mode`: `dns` (DNS proxy only) or `dns+nft` (DNS + nftables). |

---

## 3. nftables (Layer 2, dns+nft mode)

| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `opensandbox_egress_nft_apply_total` | Counter | Number of nftables ApplyStatic (static rule apply) operations. | `result`: `success` or `failure`. On failure the sidecar falls back to DNS-only mode. |
| `opensandbox_egress_nft_resolved_ips_added_total` | Counter | Number of resolved IPs added to the nftables dynamic set (count of IPs or invocations, implementation-defined). | Optional: `domain` (use with care to avoid high cardinality). |
| `opensandbox_egress_nft_doh_dot_packets_dropped_total` | Counter | Number of packets dropped due to DoH/DoT blocking. | `reason`: `dot_853` (DoT port 853), `doh_443` (DoH over 443 when enabled). |

---

## 4. Violations and Security (aligned with OSEP R7 / violation logging)

| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `opensandbox_egress_violations_total` | Counter | Number of policy denials (e.g. DNS NXDOMAIN). Can be instrumented alongside violation logs. | `type`: `dns_deny`; add e.g. `l2_deny` for L2 denials if implemented. |

---

## 5. Process / Runtime (optional)

| Metric | Type | Description | Labels |
|--------|------|-------------|--------|
| `opensandbox_egress_info` | Gauge | Constant 1; labels identify the instance and environment in Prometheus. | See "Instance identification" below: `instance_id` (recommended), `enforcement_mode`, `version`, etc. |
| `opensandbox_egress_uptime_seconds` | Gauge | Process uptime in seconds. | None. |

---

## Instance identification (keeping metrics per container)

Each sidecar container corresponds to a different sandbox; metrics must be distinguishable per instance and must not be mixed in the same time series. How this works depends on how metrics are collected:

### Instance ID source

Instance identification is **provided only via an environment variable**; the sidecar reads env and does not distinguish K8s vs Docker:

- **Env var**: `OPENSANDBOX_EGRESS_INSTANCE_ID`
- **Meaning**: Unique ID for this sidecar instance (e.g. sandbox_id, pod name, container_id), **injected by the orchestrator when creating the container**.
- **Examples**:
- Kubernetes: set via Downward API in the Pod, e.g. `OPENSANDBOX_EGRESS_INSTANCE_ID=$(POD_NAME).$(POD_NAMESPACE)` or `$(POD_UID)`.
- Docker / OpenSandbox server: pass when creating the container, e.g. `-e OPENSANDBOX_EGRESS_INSTANCE_ID=<sandbox_id>`.

Implementation notes:

- Attach the **same set of instance labels** to all metrics: read `OPENSANDBOX_EGRESS_INSTANCE_ID` and use it as the `instance_id` label, consistent with `opensandbox_egress_info`.
- If the env is unset, `instance_id` may be empty or a fallback (e.g. hostname). **When using push, configuring it is strongly recommended**, or multiple instances will share the same grouping key.

---

## Metric types

- **Counter**: Monotonically increasing value; use for request counts, error counts, etc. Prometheus typically uses `rate()` / `increase()` for rate or delta.
- **Gauge**: Current value that can go up or down; use for current rule count, mode, uptime, etc.
- **Histogram**: Bucketed observations (e.g. latency); supports quantiles and rate.
- **Summary**: Quantiles computed in the application and exposed; use for distribution metrics like latency.

---

## Exposure

- **Endpoint**: Same port as the policy server, default `GET http://<addr>/metrics` (e.g. `http://127.0.0.1:18080/metrics`).
- **Format**: Prometheus text format (`text/plain; charset=utf-8`).
- **Collection**: Because the sidecar lifecycle is short, use short-interval scrape from the same Pod or push on exit/periodically (e.g. Pushgateway, OTLP). See [README](README.md) and observability notes.
- **Instance separation**: Metrics from different container instances are separated by the labels defined in "Instance identification" (e.g. `instance_id`) or by scrape target identity; see the "Instance identification" section above.
16 changes: 13 additions & 3 deletions components/egress/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,22 @@ go 1.24.0

require (
github.com/miekg/dns v1.1.61
golang.org/x/sys v0.31.0
github.com/prometheus/client_golang v1.23.2
golang.org/x/sys v0.35.0
)

require (
github.com/beorn7/perks v1.0.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/kr/text v0.2.0 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/prometheus/client_model v0.6.2 // indirect
github.com/prometheus/common v0.66.1 // indirect
github.com/prometheus/procfs v0.16.1 // indirect
go.yaml.in/yaml/v2 v2.4.2 // indirect
golang.org/x/mod v0.18.0 // indirect
golang.org/x/net v0.38.0 // indirect
golang.org/x/sync v0.7.0 // indirect
golang.org/x/net v0.43.0 // indirect
golang.org/x/sync v0.13.0 // indirect
golang.org/x/tools v0.22.0 // indirect
google.golang.org/protobuf v1.36.8 // indirect
)
56 changes: 50 additions & 6 deletions components/egress/go.sum
Original file line number Diff line number Diff line change
@@ -1,12 +1,56 @@
github.com/beorn7/perks v1.0.1 h1:VlbKKnNfV8bJzeqoa4cOKqO6bYr3WgKZxO8Z16+hsOM=
github.com/beorn7/perks v1.0.1/go.mod h1:G2ZrVWU2WbWT9wwq4/hrbKbnv/1ERSJQ0ibhJ6rlkpw=
github.com/cespare/xxhash/v2 v2.3.0 h1:UL815xU9SqsFlibzuggzjXhog7bL6oX9BbNZnL2UFvs=
github.com/cespare/xxhash/v2 v2.3.0/go.mod h1:VGX0DQ3Q6kWi7AoAeZDth3/j3BFtOZR5XLFGgcrjCOs=
github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E=
github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c=
github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38=
github.com/google/go-cmp v0.7.0 h1:wk8382ETsv4JYUZwIsn6YpYiWiBsYLSJiTsyBybVuN8=
github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX3N/iU=
github.com/klauspost/compress v1.18.0 h1:c/Cqfb0r+Yi+JtIEq73FWXVkRonBlf0CRNYc8Zttxdo=
github.com/klauspost/compress v1.18.0/go.mod h1:2Pp+KzxcywXVXMr50+X0Q/Lsb43OQHYWRCY2AiWywWQ=
github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE=
github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk=
github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY=
github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE=
github.com/kylelemons/godebug v1.1.0 h1:RPNrshWIDI6G2gRW9EHilWtl7Z6Sb1BR0xunSBf0SNc=
github.com/kylelemons/godebug v1.1.0/go.mod h1:9/0rRGxNHcop5bhtWyNeEfOS8JIWk580+fNqagV/RAw=
github.com/miekg/dns v1.1.61 h1:nLxbwF3XxhwVSm8g9Dghm9MHPaUZuqhPiGL+675ZmEs=
github.com/miekg/dns v1.1.61/go.mod h1:mnAarhS3nWaW+NVP2wTkYVIZyHNJ098SJZUki3eykwQ=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM=
github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4=
github.com/prometheus/client_golang v1.23.2 h1:Je96obch5RDVy3FDMndoUsjAhG5Edi49h0RJWRi/o0o=
github.com/prometheus/client_golang v1.23.2/go.mod h1:Tb1a6LWHB3/SPIzCoaDXI4I8UHKeFTEQ1YCr+0Gyqmg=
github.com/prometheus/client_model v0.6.2 h1:oBsgwpGs7iVziMvrGhE53c/GrLUsZdHnqNwqPLxwZyk=
github.com/prometheus/client_model v0.6.2/go.mod h1:y3m2F6Gdpfy6Ut/GBsUqTWZqCUvMVzSfMLjcu6wAwpE=
github.com/prometheus/common v0.66.1 h1:h5E0h5/Y8niHc5DlaLlWLArTQI7tMrsfQjHV+d9ZoGs=
github.com/prometheus/common v0.66.1/go.mod h1:gcaUsgf3KfRSwHY4dIMXLPV0K/Wg1oZ8+SbZk/HH/dA=
github.com/prometheus/procfs v0.16.1 h1:hZ15bTNuirocR6u0JZ6BAHHmwS1p8B4P6MRqxtzMyRg=
github.com/prometheus/procfs v0.16.1/go.mod h1:teAbpZRB1iIAJYREa1LsoWUXykVXA1KlTmWl8x/U+Is=
github.com/rogpeppe/go-internal v1.10.0 h1:TMyTOH3F/DB16zRVcYyreMH6GnZZrwQVAoYjRBZyWFQ=
github.com/rogpeppe/go-internal v1.10.0/go.mod h1:UQnix2H7Ngw/k4C5ijL5+65zddjncjaFoBhdsK/akog=
github.com/stretchr/testify v1.11.1 h1:7s2iGBzp5EwR7/aIZr8ao5+dra3wiQyKjjFuvgVKu7U=
github.com/stretchr/testify v1.11.1/go.mod h1:wZwfW3scLgRK+23gO65QZefKpKQRnfz6sD981Nm4B6U=
go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto=
go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE=
go.yaml.in/yaml/v2 v2.4.2 h1:DzmwEr2rDGHl7lsFgAHxmNz/1NlQ7xLIrlN2h5d1eGI=
go.yaml.in/yaml/v2 v2.4.2/go.mod h1:081UH+NErpNdqlCXm3TtEran0rJZGxAYx9hb/ELlsPU=
golang.org/x/mod v0.18.0 h1:5+9lSbEzPSdWkH32vYPBwEpX8KwDbM52Ud9xBUvNlb0=
golang.org/x/mod v0.18.0/go.mod h1:hTbmBsO62+eylJbnUtE2MGJUyE7QWk4xUqPFrRgJ+7c=
golang.org/x/net v0.38.0 h1:vRMAPTMaeGqVhG5QyLJHqNDwecKTomGeqbnfZyKlBI8=
golang.org/x/net v0.38.0/go.mod h1:ivrbrMbzFq5J41QOQh0siUuly180yBYtLp+CKbEaFx8=
golang.org/x/sync v0.7.0 h1:YsImfSBoP9QPYL0xyKJPq0gcaJdG3rInoqxTWbfQu9M=
golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk=
golang.org/x/sys v0.31.0 h1:ioabZlmFYtWhL+TRYpcnNlLwhyxaM9kWTDEmfnprqik=
golang.org/x/sys v0.31.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/net v0.43.0 h1:lat02VYK2j4aLzMzecihNvTlJNQUq316m2Mr9rnM6YE=
golang.org/x/net v0.43.0/go.mod h1:vhO1fvI4dGsIjh73sWfUVjj3N7CA9WkKJNQm2svM6Jg=
golang.org/x/sync v0.13.0 h1:AauUjRAJ9OSnvULf/ARrrVywoJDy0YS2AwQ98I37610=
golang.org/x/sync v0.13.0/go.mod h1:1dzgHSNfp02xaA81J2MS99Qcpr2w7fw1gpm99rleRqA=
golang.org/x/sys v0.35.0 h1:vz1N37gP5bs89s7He8XuIYXpyY0+QlsKmzipCbUtyxI=
golang.org/x/sys v0.35.0/go.mod h1:BJP2sWEmIv4KK5OTEluFJCKSidICx8ciO85XgH3Ak8k=
golang.org/x/tools v0.22.0 h1:gqSGLZqv+AI9lIQzniJ0nZDRG5GBPsSi+DRNHWNz6yA=
golang.org/x/tools v0.22.0/go.mod h1:aCwcsjqvq7Yqt6TNyX7QMU2enbQ/Gt0bo6krSeEri+c=
google.golang.org/protobuf v1.36.8 h1:xHScyCOEuuwZEc6UtSOvPbAT4zRh0xcNRYekJwfqyMc=
google.golang.org/protobuf v1.36.8/go.mod h1:fuxRtAxBytpl4zzqUh6/eyUujkJdNiuEkXntxiD/uRU=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk=
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
3 changes: 3 additions & 0 deletions components/egress/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (
"github.com/alibaba/opensandbox/egress/pkg/constants"
"github.com/alibaba/opensandbox/egress/pkg/dnsproxy"
"github.com/alibaba/opensandbox/egress/pkg/iptables"
"github.com/alibaba/opensandbox/egress/pkg/metrics"
)

func main() {
Expand All @@ -39,11 +40,13 @@ func main() {
allowIPs := AllowIPsForNft("/etc/resolv.conf")

mode := parseMode()
metrics.SetEnforcementMode(mode)
nftMgr := createNftManager(mode)
proxy, err := dnsproxy.New(initialRules, "")
if err != nil {
log.Fatalf("failed to init dns proxy: %v", err)
}
metrics.SetPolicyRuleCount(initialRules.DefaultAction, len(initialRules.Egress))
if err := proxy.Start(ctx); err != nil {
log.Fatalf("failed to start dns proxy: %v", err)
}
Expand Down
5 changes: 5 additions & 0 deletions components/egress/nft.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ import (

"github.com/alibaba/opensandbox/egress/pkg/constants"
"github.com/alibaba/opensandbox/egress/pkg/dnsproxy"
"github.com/alibaba/opensandbox/egress/pkg/metrics"
"github.com/alibaba/opensandbox/egress/pkg/nftables"
"github.com/alibaba/opensandbox/egress/pkg/policy"
)
Expand All @@ -43,12 +44,16 @@ func setupNft(ctx context.Context, nftMgr nftApplier, initialPolicy *policy.Netw
}
policyWithNS := initialPolicy.WithExtraAllowIPs(nameserverIPs)
if err := nftMgr.ApplyStatic(ctx, policyWithNS); err != nil {
metrics.NftApplyTotal.WithLabelValues(metrics.ResultFailure).Inc()
log.Fatalf("nftables static apply failed: %v", err)
}
metrics.NftApplyTotal.WithLabelValues(metrics.ResultSuccess).Inc()
log.Printf("nftables static policy applied (table inet opensandbox)")
proxy.SetOnResolved(func(domain string, ips []nftables.ResolvedIP) {
if err := nftMgr.AddResolvedIPs(ctx, ips); err != nil {
log.Printf("[dns] add resolved IPs to nft failed: %v", err)
} else {
metrics.NftResolvedIPsAddedTotal.Add(float64(len(ips)))
}
})
}
Expand Down
15 changes: 8 additions & 7 deletions components/egress/pkg/constants/configuration.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,14 @@
package constants

const (
EnvBlockDoH443 = "OPENSANDBOX_EGRESS_BLOCK_DOH_443"
EnvDoHBlocklist = "OPENSANDBOX_EGRESS_DOH_BLOCKLIST" // comma-separated IP/CIDR
EnvEgressMode = "OPENSANDBOX_EGRESS_MODE" // dns | dns+nft
EnvEgressHTTPAddr = "OPENSANDBOX_EGRESS_HTTP_ADDR"
EnvEgressToken = "OPENSANDBOX_EGRESS_TOKEN"
EnvEgressRules = "OPENSANDBOX_EGRESS_RULES"
EnvMaxNameservers = "OPENSANDBOX_EGRESS_MAX_NS"
EnvBlockDoH443 = "OPENSANDBOX_EGRESS_BLOCK_DOH_443"
EnvDoHBlocklist = "OPENSANDBOX_EGRESS_DOH_BLOCKLIST" // comma-separated IP/CIDR
EnvEgressMode = "OPENSANDBOX_EGRESS_MODE" // dns | dns+nft
EnvEgressHTTPAddr = "OPENSANDBOX_EGRESS_HTTP_ADDR"
EnvEgressToken = "OPENSANDBOX_EGRESS_TOKEN"
EnvEgressRules = "OPENSANDBOX_EGRESS_RULES"
EnvEgressInstanceID = "OPENSANDBOX_EGRESS_INSTANCE_ID" // unique instance id for metrics instance_id label
EnvMaxNameservers = "OPENSANDBOX_EGRESS_MAX_NS"
)

const (
Expand Down
7 changes: 7 additions & 0 deletions components/egress/pkg/dnsproxy/proxy.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (

"github.com/miekg/dns"

"github.com/alibaba/opensandbox/egress/pkg/metrics"
"github.com/alibaba/opensandbox/egress/pkg/nftables"
"github.com/alibaba/opensandbox/egress/pkg/policy"
)
Expand Down Expand Up @@ -109,20 +110,26 @@ func (p *Proxy) serveDNS(w dns.ResponseWriter, r *dns.Msg) {
currentPolicy := p.policy
p.policyMu.RUnlock()
if currentPolicy != nil && currentPolicy.Evaluate(domain) == policy.ActionDeny {
metrics.DNSQueriesTotal.WithLabelValues(metrics.ResultDenied).Inc()
metrics.ViolationsTotal.WithLabelValues(metrics.ViolationTypeDNSDeny).Inc()
resp := new(dns.Msg)
resp.SetRcode(r, dns.RcodeNameError)
_ = w.WriteMsg(resp)
return
}

start := time.Now()
resp, err := p.forward(r)
metrics.DNSForwardDurationSeconds.Observe(time.Since(start).Seconds())
if err != nil {
metrics.DNSQueriesTotal.WithLabelValues(metrics.ResultForwardError).Inc()
log.Printf("[dns] forward error for %s: %v", domain, err)
fail := new(dns.Msg)
fail.SetRcode(r, dns.RcodeServerFailure)
_ = w.WriteMsg(fail)
return
}
metrics.DNSQueriesTotal.WithLabelValues(metrics.ResultAllowed).Inc()
p.maybeNotifyResolved(domain, resp)
_ = w.WriteMsg(resp)
}
Expand Down
Loading
Loading