Implement proposed `k8s-stream-file` format #265

portante · 2021-05-31T03:40:04Z

Instead of parsing the contents of the data read from the stdout and stderr pipes, this commit adds support for a "stream" format, named k8s-stream-file, which just records what is read from a pipe to disk.

It significantly saves on CPU spend processing the buffer read, uses only 2 I/O vectors, and never touches the memory read from the pipe.

This is a conceptual PR, meant to illustrate a proposed stream log file format that removes the byte level interpretation of stdout/stderr in favor of simply recording what data was read on each system call. It defers the interpretation of the byte stream to the consumer, allowing this writer to operate with as little overhead as possible (avoiding bad containers that write only newlines, or small numbers of bytes between newlines). The reader of the byte stream is then tasked with reassembling the stream according to whatever interpretation it sees fit to use.

The goal of this work is to provide a simple format that will stream well into an object store, such that, given enough metadata stored with the stream, the consumer can reconstruct the I/O stream at the time it is read.

This is an updated implementation of cri-o/cri-o#1605.

portante · 2021-05-31T03:40:27Z

See also #262.

lgtm-com · 2021-05-31T03:42:58Z

This pull request introduces 1 alert when merging c0bc805 into 3161452 - view on LGTM.com

new alerts:

1 for Use of potentially dangerous function

haircommander

Thanks for the implementation @portante ! I have some fiirst-pass review comments, but if this is just a POC feel free to ignore them until the concept is proven.

you'll need to run make fmt and commit the changes to have the majority of CI checks run.

I am betting this will perpetually fail kubernetes e2e tests but I am curious to see what actually ends up happening.

src/ctr_logging.c

haircommander · 2021-06-01T13:32:46Z

src/ctr_logging.c

+/*
+ * PROPOSED: CRI Stream Format, variable length file format
+ */
+static int set_k8s_stream_timestamp(char *buf, ssize_t bufsiz, ssize_t *tsbuflen, const char *pipename, uint64_t offset, ssize_t buflen, ssize_t *btbw)


there seem to be some similarities between this and set_k8s_timestamp. can we either make the shared functionality a function or document why that's not possible?

I did not want to disturb any existing code with this to avoid introducing un-intended bugs.

If we decide that it is worth merging this, then we should consider combing those two methods into one.

portante · 2021-06-01T15:23:17Z

you'll need to run make fmt and commit the changes to have the majority of CI checks run.

Fixed.

Thanks for the review!

rhatdan · 2021-06-01T19:00:29Z

@giuseppe PTAL

mtrmac · 2021-06-17T15:29:26Z

src/ctr_logging.c

+		off = -off;
+	}
+
+	len = snprintf(buf, bufsiz, "%d-%02d-%02dT%02d:%02d:%02d.%09ld%c%02d:%02d %s %lud %ld ", current_tm.tm_year + 1900,


(A drive-by comment with little context:)

If the goal is to offload the CPU processing to consumers, shouldn’t this just be a timestamp instead of all the timezone lookups and formatting?

Either way, UTC would be better than ambiguous local time — does this one need custom parser code to get a struct timespec back?

The problem here is establishing a reference point for what the timestamp "means".

First, for efficiency, a simple monotonically incremental timestamp is all that is needed, along with a periodic mapping of that timestamp to a real clock. This format could be modified to periodically emit (once a second? once every 5 seconds?) an entry for that mapping: <monotonic stamp, realtime stamp>

That realtime stamp would be emitted using UTC instead of the local timezone. All logs after would calculate the real timestamp from that that offset.

If that makes sense, we can implement this instead.

I should add that the goal is to push the details of interpreting the log stream to the reader and not on conmon the writer.

jnovy · 2025-09-24T10:44:47Z

@portante regarding this PR - maybe UTC format would be better as @mtrmac mentions?

packit-as-a-service · 2025-10-07T03:16:19Z

Ephemeral COPR build failed. @containers/packit-build please check.

portante · 2025-10-07T03:16:31Z

@portante regarding this PR - maybe UTC format would be better as @mtrmac mentions?

I agree that UTC would be better. Let me see about that.

Instead of parsing the contents of the data read from the `stdout` and `stderr` pipes, this commit adds support for a "stream" format, named `k8s-stream-file`, which just records what is read from a pipe to disk. It significantly saves on CPU spend processing the buffer read, uses only 2 I/O vectors, and never touches the memory read from the pipe. This is an updated implementation of cri-o/cri-o#1605. Signed-off-by: Peter Portante <peter.portante@redhat.com>

portante mentioned this pull request May 31, 2021

WIP - Proposed Stream Log Writer cri-o/cri-o#1605

Closed

portante force-pushed the stream_log_writer branch from c0bc805 to 0bd2c0a Compare May 31, 2021 14:27

haircommander reviewed Jun 1, 2021

View reviewed changes

portante force-pushed the stream_log_writer branch from 0bd2c0a to 3257f6a Compare June 1, 2021 15:20

mtrmac reviewed Jun 17, 2021

View reviewed changes

portante mentioned this pull request Jul 9, 2021

Defining log-driver and log-opt when specifying pod in RC and Pod kubernetes/kubernetes#15478

Closed

This was referenced Jan 5, 2023

Track container log throttling features as log files are inherently unstable to scrape vectordotdev/vector#5302

Open

Consider working with CRI-O to deal with byte streams from containers rather than scraping logs from files grafana/loki#231

Open

portante force-pushed the stream_log_writer branch 2 times, most recently from bb9b2f3 to 5fb760d Compare October 7, 2025 03:15

This was referenced Oct 7, 2025

[logging] allow custom message format #271

Open

adding support to forward containers output to splunk via hec connector #340

Open

portante force-pushed the stream_log_writer branch 2 times, most recently from 4e4c88c to 5e306e7 Compare October 8, 2025 17:22

portante mentioned this pull request Oct 8, 2025

An Analysis of Conmon Container Logging Behaviors with Recommendations #262

Open

portante force-pushed the stream_log_writer branch from 5e306e7 to bfc55e3 Compare November 5, 2025 19:04

Implement proposed k8s-stream-file format #265

Are you sure you want to change the base?

Implement proposed k8s-stream-file format #265

Uh oh!

Conversation

portante commented May 31, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

portante commented May 31, 2021

Uh oh!

lgtm-com bot commented May 31, 2021

Uh oh!

haircommander left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

haircommander Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

portante Jun 1, 2021

Choose a reason for hiding this comment

Uh oh!

portante commented Jun 1, 2021

Uh oh!

rhatdan commented Jun 1, 2021

Uh oh!

mtrmac Jun 17, 2021

Choose a reason for hiding this comment

Uh oh!

portante Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

portante Oct 7, 2025

Choose a reason for hiding this comment

Uh oh!

jnovy commented Sep 24, 2025

Uh oh!

packit-as-a-service bot commented Oct 7, 2025

Uh oh!

portante commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Implement proposed `k8s-stream-file` format #265

Implement proposed `k8s-stream-file` format #265

portante commented May 31, 2021 •

edited

Loading