-
Notifications
You must be signed in to change notification settings - Fork 139
Replace pthread-based logging buffer with signal masking #608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Ephemeral COPR build failed. @containers/packit-build please check. |
e95d0cc to
abf6da5
Compare
| return timer_fd; | ||
| } | ||
|
|
||
| static bool add_entry_to_buffer_locked(stdpipe_t pipe, char *buf, ssize_t size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why size can be negative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still vote for ssize_t here as it keeps the same type down the call chain + there is not a possibility of (size_t)((ssize_t)-1) == SIZE_MAX overflows - the size actually comes from read() syscall which can be negative. I think it would also justify possibly redundant size < 0 checks as we never know if the call path won't change in the future.
src/ctr_logging_buffer.c
Outdated
|
|
||
| /* Allocate temporary buffer to avoid race conditions */ | ||
| size_t flush_count = g_log_buffer->count; | ||
| log_entry_t *temp_entries = malloc(flush_count * sizeof(log_entry_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could use _cleanup_free_ and not have to call free in the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch, let me _cleanup_free this
|
FYI: Removed malloc() from the flush path. The atomic operations are a bit of an overkill but it's ready for case conmon might be multithreaded. |
portante
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These implementation seems to delay the inevitable: a buffer will block writing to a file, journald, or a remote console and still be stalled.
It might be simpler to create a process for handling all the remote consoles and just write what is read from the stdout/stderr pipes to that process using a nonblocking socket so that the remote consoles become "best effort" and don't slow down conmon.
Then use nonblocking I/O to read from the pipes and write to the log file to overlap the file I/O.
src/ctr_logging_buffer.h
Outdated
|
|
||
| /* Signal-safe async I/O buffering for log writes to prevent fsync() blocking */ | ||
|
|
||
| #define BUFFER_SIZE (STDIO_BUF_SIZE * 4) /* 32KB buffer */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this would be determined at runtime using the actual size of the pipe.
src/ctr_logging_buffer.c
Outdated
| /* Enter critical section by blocking signals */ | ||
| static int enter_critical_section(sigset_t *old_mask) | ||
| { | ||
| init_signal_mask(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would this call be necessary every time we enter a critical section if we just called it during init_async_logging()?
| bool init_async_logging(void) | ||
| { | ||
| sigset_t old_mask; | ||
| if (enter_critical_section(&old_mask) != 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a point where init_async_logging() could be called ensuring it is only called once? This logic seems like it is guarding against a behavior that could be avoided.
src/ctr_logging_buffer.c
Outdated
| } | ||
|
|
||
| /* Allocate buffer */ | ||
| async_log_buffer_t *new_buffer = calloc(1, sizeof(async_log_buffer_t)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use g_malloc0() in src/ctr_logging.c, could the be used here?
src/ctr_logging_buffer.c
Outdated
|
|
||
| /* Handle empty writes explicitly - these are valid drain operations */ | ||
| if (num_read == 0) { | ||
| return write_to_logs(pipe, buf, num_read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some partial buffers that need to be written, why wouldn't this also be in a critical section?
|
@portante updated the PR - please PTAL |
…afety This commit replaces the pthread mutex-based synchronization in the async logging buffer with signal masking, eliminating the pthread dependency while maintaining thread safety. Fixes: containers#38 Signed-off-by: Jindrich Novy <jnovy@redhat.com>
portante
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't personally feel like I am in a position to hold up a commit here or sign off on a commit, so please consider the following comment in that light.
This buffering mechanism does not seem to add enough value for its level of complexity. It appears to implement a system where reads from the stdout and stderr pipes are buffered in memory and then synchronously flushed when that memory buffer is full, or at a later time when a timer fires.
It might be a much better return on the investment and complexity to use non-blocking I/O for the pipes, remote consoles, and log file. The data read would be queued to each output FD up to a maximum amount before conmon waits for that FD to drain. Such a scheme gives you the best ability to overlap the various I/O operations required while reducing the memory footprint and memory copies.
| static int enter_critical_section(sigset_t *old_mask) | ||
| { | ||
| /* Safety check: signal mask must be initialized before use */ | ||
| if (!signal_mask_initialized) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems wasteful. There should be an explicit point during conmon initialization where the signal mask is initialized before any critical section code is referenced.
| ]; | ||
| prePatch = '' | ||
| export CFLAGS='-static -pthread' | ||
| export CFLAGS='-static' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the last 4 years this reference to -pthread has been there without the need for it?
If so, perhaps this is worthy of pulling out into a separate commit.
| success = add_entry_to_buffer_locked(pipe, buf, num_read); | ||
| } | ||
|
|
||
| if (!success) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check appears to only be required to check the return value from line 204, as the value of success will still be true if it is true at line 201.
Is it here to save indentation?
Can we consider moving this to follow line 204 in that same block?
|
|
||
| static void cleanup_flush_timer(void) | ||
| { | ||
| guint current_id = atomic_load_explicit(&timer_source_id, memory_order_acquire); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use atomic_compare_exchange_strong_explicit() and then if the value is not NULL call g_source_remove()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like we are complicating set up and tear down actions.
The set up tasks should be callable from a single point before the need for atomics (no callbacks in place).
The cleanup for the timer should also be called in one place where a single variable indicating cleanup has started has already been set.
That would avoid the need for these atomics during the cleanup process.
| } | ||
|
|
||
| /* Check if we should flush immediately */ | ||
| bool should_flush = should_flush_buffer_locked(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not pass buffer already fetched at line 189 to should_flush_buffer_locked() and avoid having it reload the buffer pointer?
|
I agree - let me drop this one and come up with something less complicated. |
This commit replaces the pthread mutex-based synchronization in the async logging buffer with signal masking, eliminating the pthread dependency while maintaining thread safety. This should reduce conmon latency overhead by ~15%.
Fixes: #38