perf(loki/stages): add SyncStage to reduce pipeline goroutines (N+3 → 1 per PodLogs) by QuentinBisson · Pull Request #6001 · grafana/alloy

QuentinBisson · 2026-04-07T13:06:27Z

Summary

Merge order: this PR depends on #5999 and #6000 and should be merged last.

Adds a SyncStage optional interface to the loki.process stages package so that pipelines whose stages all support synchronous processing can run in a single goroutine instead of the N+3 goroutine chain produced by Pipeline.Start.

This is particularly important for loki.source.podlogs (#6000), where each PodLogs CRD with pipelineStages creates one pipeline. At scale (100–500 PodLogs CRDs), the goroutine savings are significant.

Changes

SyncStage interface + Pipeline.StartSync

New optional SyncStage interface: ProcessEntry(e Entry) []Entry
Pipeline.CanProcessSync() returns true when all stages implement SyncStage (deep-checks nested match pipelines via syncChecker)
Pipeline.StartSync() runs the whole pipeline in 1 goroutine instead of N+3
Pipeline.processEntryFull() private method applies stages to an already-initialised Entry (used by matcherStage)

Stages that now implement SyncStage

All Processor-based stages (regex, json, logfmt, labels, template, tenant, timestamp, output, replace, pattern, luhn, label_drop, label_keep, static_labels, docker, …) get ProcessEntry for free via stageProcessor
cri: processLine() helper extracted — Run and ProcessEntry share the same implementation
truncate: applyRules() helper extracted — Run simplified to RunWith(in, m.applyRules)
drop: ProcessEntry added
geoip: ProcessEntry added
sampling: ProcessEntry added
pack: ProcessEntry added
limit: ProcessEntry added
match: ProcessEntry + canProcessSync() — keep actions are sync-safe when the nested pipeline is also sync-capable; drop actions are always sync-safe

Goroutine comparison (N = number of stages)

Path	Goroutines
`Pipeline.Start`	N + 3
`Pipeline.StartSync`	1

Test plan

All existing stage tests pass
Pipeline.CanProcessSync() returns false for pipelines containing multiline
Pipeline.CanProcessSync() returns true for all-sync pipelines including nested match
A match stage with a non-sync inner pipeline does not incorrectly report sync support

Refs: #4738

🤖 Generated with Claude Code

Adds `json:"..."` tags to every stage configuration struct in loki/process/stages and loki/process/metric. These tags are required to serialize stage configs to/from JSON for use in the PodLogs CRD (see grafana#4738). Notable serialization choices: - `time.Duration` fields (OlderThan, MaxIdle) are tagged `json:"-"` because Go serialises them as nanosecond int64 values, which are not human-readable in Kubernetes YAML. Users can rely on the default value. - `units.Base2Bytes` already implements TextMarshaler/TextUnmarshaler and serialises as "5MiB", so it is exposed normally. No behaviour change — the alloy:"..." tags that drive existing parsing are untouched; json tags are purely additive. Refs: grafana#4738 fix(loki/stages): mark GeoIPConfig.Source as required in JSON Remove omitempty from the Source field JSON tag so the field is never silently dropped when serialising a PodLogs pipeline stage config. Source is a required field (alloy:"source,attr") and omitting it in JSON would leave a nil pointer that causes the stage to do nothing. Refs: grafana#4738 Signed-off-by: QuentinBisson <quentin@giantswarm.io>

Implements grafana#4738: PodLogs resources can now declare `pipelineStages` in their spec to apply log processing stages to every log line collected by that resource, before forwarding to the fanout. Design summary: - `stages.PodLogsStageConfig`: JSON-tagged subset of StageConfig that excludes multiline (lines from different pods interleave), windowsevent, and eventlogmessage (Linux-only context). - One `stages.Pipeline` is created per PodLogs resource with stages. Per-PodLogs Prometheus metrics are namespaced via WrapRegistererWith. - `kubetail.Target` gains an optional `handler loki.EntryHandler`; when set, the tailer routes entries through that handler instead of the global one. `tailerTask.Equals` compares handler pointers so pipeline changes trigger a tailer restart automatically. - Pipeline lifecycle is managed in the reconciler: new pipelines are created before SyncTargets; old/replaced pipelines are stopped only after SyncTargets returns (which waits for stopped tailer goroutines to exit), eliminating any window where a tailer goroutine could write to a dead pipeline channel. - CRD YAML updated with a full OpenAPIV3 schema for all 25 supported stage types. Known limitation (TODO in reconciler.go): Pipeline.Start spawns N+3 goroutines per PodLogs resource. At large scale a synchronous Pipeline.Process path would reduce this to zero extra goroutines. Refs: grafana#4738

Two bugs found in review: activePipelineKeys goroutine leak: the key was pre-populated before reconcilePodLogs was called, so if reconcilePodLogs returned early (invalid relabeling, etc.) the old pipeline was never stopped even after SyncTargets removed all its tailers. Fix: check r.pipelines after reconcilePodLogs returns - only mark the key active when ensurePipeline actually succeeded. Backoff not reset on pipeline path: when a per-target pipeline handler is set, processLogStream bypasses the mutator handler that calls bo.Reset() on each received entry, so the backoff was never reset. Fix: thread an onEntrySent callback through tail→processLogStream; bo.Reset is passed on the pipeline path, a no-op on the default path. Refs: grafana#4738

Pipeline.Start spawns N+3 goroutines per pipeline: one RunWith goroutine per stage, one for the label-extraction init step, and two adapter goroutines. At large scale (hundreds of PodLogs with stages) this adds noticeable overhead. This commit introduces SyncStage, an optional interface for stages that can process a single entry synchronously without goroutines: ``` ProcessEntry(e Entry) []Entry ``` All stages supported in PodLogsStageConfig implement SyncStage except match (which uses nested sub-pipelines — tracked as a follow-up). stageProcessor gains ProcessEntry for free, covering the 13+ stages that are already Processor-based (regex, logfmt, template, etc.). Custom implementations are added to: cri, decolorize, drop, geoip, json, labels, limit, metric, pack, sampling, structured_metadata, structured_metadata_drop, and truncate. Pipeline gains: ``` CanProcessSync() bool — true when all stages are SyncStage ProcessEntry(e loki.Entry) []loki.Entry — synchronous application StartSync(in, out) — 1-goroutine handler using ProcessEntry ``` ensurePipeline in the PodLogs reconciler now calls StartSync when the pipeline supports it: ``` Before (3 stages): 6 goroutines per PodLogs resource with stages After (3 stages): 1 goroutine per PodLogs resource with stages ``` For 500 PodLogs with 3-stage pipelines that's 3000 → 500 goroutines. Pipelines containing match fall back to Start (unchanged behaviour). Refs: grafana#4738

Several issues found in review: PodLogsMatchConfig: use PodLogsStageConfig for nested match stages so that incompatible stage types (multiline, windowsevent, eventlogmessage) are excluded recursively. PodLogsStageConfig.MatchConfig is now *PodLogsMatchConfig instead of *MatchConfig. matcherStage SyncStage: implement ProcessEntry and canProcessSync on matcherStage. Drop actions are always sync-safe; keep actions are sync-safe only when the nested pipeline is also sync-capable. Pipeline. CanProcessSync checks the new syncChecker interface so a match stage with a non-sync inner pipeline does not falsely advertise sync support. cri: extract processLine() shared helper used by both Run and ProcessEntry, eliminating ~60 lines of duplication. truncate: extract applyRules() shared helper; Run is now a one-liner via RunWith; debug logs are present in both paths. structured_metadata: add missing debug log calls to ProcessEntry to match the Run implementation. processEntryFull: add private pipeline method that applies stages to an already-initialized Entry without reinitializing Extracted from labels. Used by ProcessEntry and by matcherStage for nested pipelines. Refs: grafana#4738

… CRD Using json:"-" on DropConfig.OlderThan silently discarded the field during JSON unmarshaling, so a PodLogs user writing older_than: 5m would get no error and no filtering — a silent no-op. Fix: add PodLogsDropConfig with OlderThan string (human-readable, e.g. "5m") and a toDropConfig() conversion that calls time.ParseDuration and returns an error on invalid input. PodLogsStageConfig.DropConfig is now *PodLogsDropConfig instead of *DropConfig, matching the same pattern used for PodLogsMatchConfig. ToStageConfig and ConvertPodLogsStages now return errors so that any invalid field value (malformed duration, etc.) surfaces immediately when the pipeline is built rather than being silently ignored. Also revert the json:"-" workaround on DropConfig.OlderThan now that the CRD path uses its own type. Refs: grafana#4738

QuentinBisson added 6 commits April 7, 2026 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(loki/stages): add SyncStage to reduce pipeline goroutines (N+3 → 1 per PodLogs)#6001

perf(loki/stages): add SyncStage to reduce pipeline goroutines (N+3 → 1 per PodLogs)#6001
QuentinBisson wants to merge 6 commits intografana:mainfrom
QuentinBisson:feat/podlogs-pipeline-sync-process

QuentinBisson commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

QuentinBisson commented Apr 7, 2026

Summary

Changes

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant