Replace pthread-based logging buffer with signal masking #608

jnovy · 2025-09-26T10:46:43Z

This commit replaces the pthread mutex-based synchronization in the async logging buffer with signal masking, eliminating the pthread dependency while maintaining thread safety. This should reduce conmon latency overhead by ~15%.

Fixes: #38

packit-as-a-service · 2025-09-26T10:49:41Z

Ephemeral COPR build failed. @containers/packit-build please check.

giuseppe · 2025-09-26T11:32:52Z

src/ctr_logging_buffer.c

+	return timer_fd;
+}
+
+static bool add_entry_to_buffer_locked(stdpipe_t pipe, char *buf, ssize_t size)


why size can be negative?

I'd still vote for ssize_t here as it keeps the same type down the call chain + there is not a possibility of (size_t)((ssize_t)-1) == SIZE_MAX overflows - the size actually comes from read() syscall which can be negative. I think it would also justify possibly redundant size < 0 checks as we never know if the call path won't change in the future.

giuseppe · 2025-09-26T11:39:46Z

src/ctr_logging_buffer.c

+
+	/* Allocate temporary buffer to avoid race conditions */
+	size_t flush_count = g_log_buffer->count;
+	log_entry_t *temp_entries = malloc(flush_count * sizeof(log_entry_t));


you could use _cleanup_free_ and not have to call free in the end

nice catch, let me _cleanup_free this

jnovy · 2025-10-01T13:23:08Z

FYI: Removed malloc() from the flush path. The atomic operations are a bit of an overkill but it's ready for case conmon might be multithreaded.

portante

These implementation seems to delay the inevitable: a buffer will block writing to a file, journald, or a remote console and still be stalled.

It might be simpler to create a process for handling all the remote consoles and just write what is read from the stdout/stderr pipes to that process using a nonblocking socket so that the remote consoles become "best effort" and don't slow down conmon.

Then use nonblocking I/O to read from the pipes and write to the log file to overlap the file I/O.

portante · 2025-10-07T03:48:41Z

src/ctr_logging_buffer.h

+
+/* Signal-safe async I/O buffering for log writes to prevent fsync() blocking */
+
+#define BUFFER_SIZE (STDIO_BUF_SIZE * 4) /* 32KB buffer */


Ideally this would be determined at runtime using the actual size of the pipe.

portante · 2025-10-07T03:52:20Z

src/ctr_logging_buffer.c

+/* Enter critical section by blocking signals */
+static int enter_critical_section(sigset_t *old_mask)
+{
+	init_signal_mask();


Why would this call be necessary every time we enter a critical section if we just called it during init_async_logging()?

portante · 2025-10-07T03:54:45Z

src/ctr_logging_buffer.c

+bool init_async_logging(void)
+{
+	sigset_t old_mask;
+	if (enter_critical_section(&old_mask) != 0) {


Isn't there a point where init_async_logging() could be called ensuring it is only called once? This logic seems like it is guarding against a behavior that could be avoided.

portante · 2025-10-07T03:56:06Z

src/ctr_logging_buffer.c

+	}
+
+	/* Allocate buffer */
+	async_log_buffer_t *new_buffer = calloc(1, sizeof(async_log_buffer_t));


We use g_malloc0() in src/ctr_logging.c, could the be used here?

portante · 2025-10-07T03:57:50Z

src/ctr_logging_buffer.c

+
+	/* Handle empty writes explicitly - these are valid drain operations */
+	if (num_read == 0) {
+		return write_to_logs(pipe, buf, num_read);


There are some partial buffers that need to be written, why wouldn't this also be in a critical section?

jnovy · 2025-12-01T11:30:15Z

@portante updated the PR - please PTAL

…afety This commit replaces the pthread mutex-based synchronization in the async logging buffer with signal masking, eliminating the pthread dependency while maintaining thread safety. Fixes: containers#38 Signed-off-by: Jindrich Novy <jnovy@redhat.com>

portante

I don't personally feel like I am in a position to hold up a commit here or sign off on a commit, so please consider the following comment in that light.

This buffering mechanism does not seem to add enough value for its level of complexity. It appears to implement a system where reads from the stdout and stderr pipes are buffered in memory and then synchronously flushed when that memory buffer is full, or at a later time when a timer fires.

It might be a much better return on the investment and complexity to use non-blocking I/O for the pipes, remote consoles, and log file. The data read would be queued to each output FD up to a maximum amount before conmon waits for that FD to drain. Such a scheme gives you the best ability to overlap the various I/O operations required while reducing the memory footprint and memory copies.

portante · 2025-12-01T19:55:42Z

src/ctr_logging_buffer.c

+static int enter_critical_section(sigset_t *old_mask)
+{
+	/* Safety check: signal mask must be initialized before use */
+	if (!signal_mask_initialized) {


This seems wasteful. There should be an explicit point during conmon initialization where the signal mask is initialized before any critical section code is referenced.

portante · 2025-12-01T20:03:19Z

nix/derivation.nix

  ];
  prePatch = ''
-    export CFLAGS='-static -pthread'
+    export CFLAGS='-static'


For the last 4 years this reference to -pthread has been there without the need for it?

If so, perhaps this is worthy of pulling out into a separate commit.

portante · 2025-12-01T20:10:09Z

src/ctr_logging_buffer.c

+		success = add_entry_to_buffer_locked(pipe, buf, num_read);
+	}
+
+	if (!success) {


This check appears to only be required to check the return value from line 204, as the value of success will still be true if it is true at line 201.

Is it here to save indentation?

Can we consider moving this to follow line 204 in that same block?

portante · 2025-12-01T20:14:41Z

src/ctr_logging_buffer.c

+
+static void cleanup_flush_timer(void)
+{
+	guint current_id = atomic_load_explicit(&timer_source_id, memory_order_acquire);


Why not use atomic_compare_exchange_strong_explicit() and then if the value is not NULL call g_source_remove()?

This looks like we are complicating set up and tear down actions.

The set up tasks should be callable from a single point before the need for atomics (no callbacks in place).

The cleanup for the timer should also be called in one place where a single variable indicating cleanup has started has already been set.

That would avoid the need for these atomics during the cleanup process.

portante · 2025-12-01T20:41:20Z

src/ctr_logging_buffer.c

+	}
+
+	/* Check if we should flush immediately */
+	bool should_flush = should_flush_buffer_locked();


Why not pass buffer already fetched at line 189 to should_flush_buffer_locked() and avoid having it reload the buffer pointer?

jnovy · 2025-12-02T09:07:26Z

I agree - let me drop this one and come up with something less complicated.

jnovy requested a review from giuseppe September 26, 2025 10:47

jnovy force-pushed the 38 branch from 49a377f to 7d32d0c Compare September 26, 2025 10:47

jnovy force-pushed the 38 branch 2 times, most recently from e95d0cc to abf6da5 Compare September 26, 2025 12:00

giuseppe reviewed Sep 26, 2025

View reviewed changes

jnovy force-pushed the 38 branch from abf6da5 to 0bffe95 Compare September 29, 2025 16:04

jnovy requested a review from giuseppe September 29, 2025 16:04

jnovy force-pushed the 38 branch from 0bffe95 to d5e432a Compare October 1, 2025 13:18

portante reviewed Oct 7, 2025

View reviewed changes

jnovy force-pushed the 38 branch from d5e432a to ff9e320 Compare October 29, 2025 10:50

jnovy force-pushed the 38 branch from ff9e320 to 7270d7b Compare December 1, 2025 11:29

jnovy force-pushed the 38 branch from 7270d7b to 29a7d86 Compare December 1, 2025 11:32

portante reviewed Dec 1, 2025

View reviewed changes

jnovy closed this Dec 2, 2025


		/* Signal-safe async I/O buffering for log writes to prevent fsync() blocking */

		#define BUFFER_SIZE (STDIO_BUF_SIZE * 4) /* 32KB buffer */

Replace pthread-based logging buffer with signal masking #608

Replace pthread-based logging buffer with signal masking #608

Uh oh!

Conversation

jnovy commented Sep 26, 2025

Uh oh!

packit-as-a-service bot commented Sep 26, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnovy commented Oct 1, 2025

Uh oh!

portante left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnovy commented Dec 1, 2025

Uh oh!

portante left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnovy commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants