Skip to content

feat(loki/podlogs): add pipelineStages processing to PodLogs CRD#6000

Draft
QuentinBisson wants to merge 3 commits intografana:mainfrom
QuentinBisson:feat/podlogs-pipeline-stages
Draft

feat(loki/podlogs): add pipelineStages processing to PodLogs CRD#6000
QuentinBisson wants to merge 3 commits intografana:mainfrom
QuentinBisson:feat/podlogs-pipeline-stages

Conversation

@QuentinBisson
Copy link
Copy Markdown
Contributor

Summary

Merge order: this PR depends on #5999 (JSON tags) and should be merged after it.

Embeds a loki.process-style pipeline directly in the PodLogs CRD spec so teams can self-service drop/transform logs at collection time, without coupling to a shared loki.process component in the global Alloy config.

What's added

  • PodLogsStageConfig — a JSON-only subset of StageConfig that excludes stages incompatible with a shared per-CRD pipeline:
    • multiline: lines from different pods interleave, causing incorrect merging across pod boundaries.
    • windowsevent / eventlogmessage: not applicable to Linux pod logs.
  • PodLogsDropConfig — mirrors DropConfig but with OlderThan string (e.g. "5m") instead of time.Duration (which serialises as a nanosecond int64 in JSON).
  • PodLogsMatchConfig — mirrors MatchConfig but uses []PodLogsStageConfig for nested stages, so the exclusions apply recursively.
  • PipelineStages []PodLogsStageConfig field in PodLogsSpec.
  • Per-PodLogs pipeline lifecycle in the reconciler: pipelines are created/replaced when stage config changes, stopped after SyncTargets (so no tailer goroutine writes to a dead channel), and torn down on component shutdown.
  • Per-target loki.EntryHandler in kubetail.Target so tailers route entries to the pipeline rather than the global handler.
  • OpenAPIV3 schema for pipelineStages in the CRD YAML.

Design notes

  • One pipeline per PodLogs resource (shared across all matched pods/containers). Multiline is explicitly excluded for correctness.
  • Metrics are namespaced per PodLogs via prometheus.WrapRegistererWith.
  • Backoff reset works correctly on both the default handler path and the pipeline path.

Test plan

  • PodLogs with pipelineStages drops/transforms entries correctly
  • PodLogs without pipelineStages behaves identically to today
  • Invalid olderThan duration surfaces as a reconcile error, not a silent no-op
  • Changing pipelineStages on a live CRD restarts the pipeline without leaking goroutines
  • node_filter works correctly with pipelines

Refs: #4738

🤖 Generated with Claude Code

Adds `json:"..."` tags to every stage configuration struct in
loki/process/stages and loki/process/metric. These tags are required to serialize stage configs to/from JSON for use in the PodLogs CRD (see grafana#4738).

Notable serialization choices:
- `time.Duration` fields (OlderThan, MaxIdle) are tagged `json:"-"` because Go serialises them as nanosecond int64 values, which are not human-readable in Kubernetes YAML. Users can rely on the default value.
- `units.Base2Bytes` already implements TextMarshaler/TextUnmarshaler and serialises as "5MiB", so it is exposed normally.

No behaviour change — the alloy:"..." tags that drive existing parsing are untouched; json tags are purely additive.

Refs: grafana#4738

fix(loki/stages): mark GeoIPConfig.Source as required in JSON

Remove omitempty from the Source field JSON tag so the field is never
silently dropped when serialising a PodLogs pipeline stage config.
Source is a required field (alloy:"source,attr") and omitting it in
JSON would leave a nil pointer that causes the stage to do nothing.

Refs: grafana#4738
Signed-off-by: QuentinBisson <quentin@giantswarm.io>
Implements grafana#4738: PodLogs resources can now declare
`pipelineStages` in their spec to apply log processing stages to every log line collected by that resource, before forwarding to the fanout.

Design summary:
- `stages.PodLogsStageConfig`: JSON-tagged subset of StageConfig that excludes multiline (lines from different pods interleave), windowsevent, and eventlogmessage (Linux-only context).
- One `stages.Pipeline` is created per PodLogs resource with stages. Per-PodLogs Prometheus metrics are namespaced via WrapRegistererWith.
- `kubetail.Target` gains an optional `handler loki.EntryHandler`; when set, the tailer routes entries through that handler instead of the global one. `tailerTask.Equals` compares handler pointers so pipeline changes trigger a tailer restart automatically.
- Pipeline lifecycle is managed in the reconciler: new pipelines are created before SyncTargets; old/replaced pipelines are stopped only after SyncTargets returns (which waits for stopped tailer goroutines to exit), eliminating any window where a tailer goroutine could write to a dead pipeline channel.
- CRD YAML updated with a full OpenAPIV3 schema for all 25 supported stage types.

Known limitation (TODO in reconciler.go): Pipeline.Start spawns N+3
goroutines per PodLogs resource. At large scale a synchronous
Pipeline.Process path would reduce this to zero extra goroutines.

Refs: grafana#4738
Two bugs found in review:

activePipelineKeys goroutine leak: the key was pre-populated before
reconcilePodLogs was called, so if reconcilePodLogs returned early
(invalid relabeling, etc.) the old pipeline was never stopped even
after SyncTargets removed all its tailers. Fix: check r.pipelines
after reconcilePodLogs returns - only mark the key active when
ensurePipeline actually succeeded.

Backoff not reset on pipeline path: when a per-target pipeline handler
is set, processLogStream bypasses the mutator handler that calls
bo.Reset() on each received entry, so the backoff was never reset.
Fix: thread an onEntrySent callback through tail→processLogStream;
bo.Reset is passed on the pipeline path, a no-op on the default path.

Refs: grafana#4738
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant