Optimize query logging performance (#5243)#5364
Conversation
Detailed Changes: - Introduced LogBuffer and LogBufferThreadContext to implement per-thread buffering, reducing lock contention. - Replaced localtime() with localtime_r() for improved performance and thread safety during timestamp generation. - Implemented configurable sampling for event logs to reduce I/O overhead. - Added global configuration variables to control flush size, flush interval, and sampling rate for event and audit logs.
📝 WalkthroughWalkthroughIntroduces a LogBuffer abstraction and per-thread LogBufferThreadContext for buffered, sampled logging; replaces direct std::fstream I/O with LogBuffer* in MySQL/PostgreSQL event writers; adds flush/rotate utilities, per-thread runtime config, and thread-safe logfile state tracking. Changes
Sequence Diagram(s)sequenceDiagram
participant Worker as WorkerThread
participant Logger as MySQL_Logger / PgSQL_Logger
participant CTX as LogBufferThreadContext
participant Buffer as LogBuffer
participant File as LogFile
Worker->>Logger: log_request(event)
Logger->>CTX: get_log_thread_context()
CTX-->>Logger: thread_context (events/audit buffers)
Logger->>Buffer: append formatted message
alt buffer size/time threshold reached
Logger->>Logger: flush_and_rotate(buffer)
Logger->>File: write buffer contents -> flush/rotate
File-->>Logger: flushed / rotated
end
Logger-->>Worker: return
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
⚔️ Resolve merge conflicts (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @mevishalr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request dramatically enhances the query logging performance of ProxySQL by introducing a buffered, per-thread logging mechanism. These changes aim to alleviate bottlenecks caused by frequent disk I/O and lock contention, leading to a substantial increase in queries per second that can be logged. Additionally, it provides more control over logging behavior through new configuration options and improves the robustness of log file handling. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces significant performance improvements to query logging by implementing per-thread buffered logging, which is an excellent approach to reduce lock contention. The new configuration variables for controlling flushing behavior and the addition of log sampling provide valuable flexibility. The replacement of localtime() with the thread-safe localtime_r() is also a commendable improvement.
I have identified a critical race condition in the creation of thread-local log contexts that could lead to incorrect behavior. Additionally, there's a high-severity bug where a log buffer isn't cleared after use, potentially corrupting log files. I've also included some medium-severity suggestions to enhance code safety by replacing sprintf with snprintf.
| std::lock_guard<std::mutex> lock(log_thread_contexts_lock); | ||
| log_thread_contexts[tid] = std::move(new_context); | ||
| } |
There was a problem hiding this comment.
There is a race condition here. If two threads call GetLogBufferThreadContext for the first time, both can pass the initial check, create a new_context, and then attempt to insert it into the map. The second thread to acquire the lock will overwrite the context created by the first, leading to incorrect behavior. To fix this, you should re-check if the context exists after acquiring the lock for the second time, before inserting the new one. This is a classic double-checked locking pattern.
std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
auto it = log_thread_contexts.find(tid);
if (it != log_thread_contexts.end()) {
// Another thread created it in the meantime.
// The `new_context` we created will be safely destroyed when it goes out of scope.
return it->second.get();
}
log_thread_contexts[tid] = std::move(new_context);| log_ctx->events.flush_to_file(events.logfile); | ||
| events.current_log_size += log_ctx->events.size(); |
There was a problem hiding this comment.
The log buffer log_ctx->events is not reset after writing the metadata event and flushing it to the file. This will cause the metadata to remain in the buffer and be prepended to the next log entry written by this thread, leading to duplicated data and incorrect log files. The buffer should be reset after it's flushed.
log_ctx->events.flush_to_file(events.logfile);
events.current_log_size += log_ctx->events.size();
log_ctx->events.reset(monotonic_time());| sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000); | ||
| if (localtime_r(&timer, &tm_info)) { | ||
| strftime(buffer1, 32, "%Y-%m-%d %H:%M:%S", &tm_info); | ||
| sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000); |
There was a problem hiding this comment.
sprintf is used here for string formatting. This function is unsafe as it doesn't perform bounds checking and can lead to buffer overflows. It's better to use snprintf for safer string formatting.
This issue is also present on lines 634, 1087, and 1101. Please apply similar changes there.
snprintf(buffer2, sizeof(buffer2),"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);| sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000); | ||
| if (localtime_r(&timer, &tm_info)) { | ||
| strftime(buffer1, 32, "%Y-%m-%d %H:%M:%S", &tm_info); | ||
| sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000); |
There was a problem hiding this comment.
sprintf is used here for string formatting. This function is unsafe as it doesn't perform bounds checking and can lead to buffer overflows. It's better to use snprintf for safer string formatting.
This issue is also present on lines 232, 425, and 439. Please apply similar changes there.
snprintf(buffer2, sizeof(buffer2),"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);There was a problem hiding this comment.
Actionable comments posted: 6
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
lib/PgSQL_Logger.cpp (2)
564-580:⚠️ Potential issue | 🟠 Major
events_close_log_unlocked/audit_close_log_unlockeddon't setlogfile_opentofalse.When a log file is closed (e.g., during disable or rotation),
logfile_openremainstrue. Worker threads that pass the lock-freeis_events_logfile_open()/is_audit_logfile_open()check will continue writing into their per-thread buffers and attempt flushes. Whileflush_and_rotateguards against aNULL logfileunder the lock, data accumulates silently in per-thread buffers during the window where the file is closed butlogfile_openhasn't been toggled. This is particularly relevant if logging is disabled via an empty filename —events.enabledcatches it, but the atomic flag is left stale.Proposed fix
void PgSQL_Logger::events_close_log_unlocked() { if (events.logfile) { events.logfile->flush(); events.logfile->close(); delete events.logfile; events.logfile=NULL; + set_events_logfile_open(false); } } void PgSQL_Logger::audit_close_log_unlocked() { if (audit.logfile) { audit.logfile->flush(); audit.logfile->close(); delete audit.logfile; audit.logfile=NULL; + set_audit_logfile_open(false); } }
102-134:⚠️ Potential issue | 🟡 Minor
write_query_format_2_jsonalways returnstotal_bytes = 0.
total_bytesis initialized to 0 at line 371 and never updated before the return at line 462. The callerPgSQL_Event::write(line 103) returns this value. If the return value is used for size tracking (aswrite_query_format_1does), JSON-format logging will always report 0 bytes written. This appears to be pre-existing behavior carried forward, but worth noting as it could affect any metrics relying on this return value.Also applies to: 369-463
lib/MySQL_Logger.cpp (2)
1284-1300:⚠️ Potential issue | 🟠 MajorSame
logfile_openstale-flag issue as inPgSQL_Logger.cpp.
events_close_log_unlocked()andaudit_close_log_unlocked()set the logfile pointer toNULLbut never callset_events_logfile_open(false)/set_audit_logfile_open(false). See the corresponding comment onPgSQL_Logger.cppfor details and proposed fix — the same change is needed here.
1338-1358:⚠️ Potential issue | 🔴 CriticalAdd buffer reset after direct flush_to_file to prevent metadata event duplication in logs
The metadata event is flushed directly to file but the buffer is not cleared. When the thread subsequently logs another event and
flush_and_rotateis called, the same metadata bytes will be written to the file again. This deviates from the standard pattern used inflush_and_rotate, which callsbuffer.reset()after flushing.Fix
metaEvent.write(&log_ctx->events, nullptr); log_ctx->events.flush_to_file(events.logfile); events.current_log_size += log_ctx->events.size(); + log_ctx->events.reset(monotonic_time());
🤖 Fix all issues with AI agents
In `@include/log_utils.h`:
- Around line 1-11: The header is not self-contained: GetLogBufferThreadContext
uses pthread_t as a map key but <pthread.h> is not included; add `#include`
<pthread.h> near the other includes in include/log_utils.h so pthread_t is
defined (ensuring the header no longer relies on transitive includes), and
confirm any declarations or typedefs referencing pthread_t (e.g., the map key in
GetLogBufferThreadContext) compile without additional includes.
- Around line 121-128: The docstring for flush_to_file is incorrect about
resetting the buffer; update the comment for flush_to_file (function
flush_to_file in log_utils.h) to state that it writes the contents of the buffer
to the provided std::fstream but does not reset or clear the buffer (the
rotation/reset is done by flush_and_rotate), so future maintainers are not
misled; keep the function behavior unchanged unless you intentionally want to
move reset logic into flush_to_file, in which case update the implementation in
log_utils.cpp (and tests) accordingly.
In `@lib/log_utils.cpp`:
- Around line 81-108: In flush_and_rotate, logfile->flush() is called after
rotate_fn() which may delete the fstream (use-after-free); move the flush so it
runs immediately after buffer.flush_to_file(logfile) and before the rotation
check (i.e., call logfile->flush() right after current_log_size +=
buffer.size()), or alternatively check logfile validity after rotate_fn() before
calling flush; update the function flush_and_rotate and any uses of
rotate_fn/events_close_log_unlocked/events_flush_log_unlocked to ensure
rotate_fn() cannot invalidate logfile before flush.
In `@lib/MySQL_Logger.cpp`:
- Around line 1212-1232: Destructor is calling flush_and_rotate(log_ctx->events,
events.logfile, events.current_log_size, events.max_log_file_size, [this]() {
wrlock(); }, [this]() { wrunlock(); }, nullptr) (and similarly for audit) with
seven arguments but the flush_and_rotate overload used elsewhere (see
PgSQL_Logger) expects a reset_time callback as an additional parameter; update
these calls in MySQL_Logger::~MySQL_Logger (where LogBufferThreadContext*
log_ctx is handled) to supply the correct reset_time callable (or the same
sentinel used in PgSQL_Logger) instead of the current nullptr, or adjust to call
the proper overload so events and audit flushes provide the reset_time callback
along with the existing wrlock/wrunlock lambdas and file/max-size args.
In `@lib/MySQL_Thread.cpp`:
- Around line 1826-1882: The setters currently call atoi() (e.g. in the blocks
that assign variables.eventslog_flush_timeout, variables.eventslog_flush_size,
variables.auditlog_flush_timeout, variables.auditlog_flush_size and
eventslog_rate_limit) which silently converts invalid strings to 0; replace
atoi() with strtol/strtoul and validate the end-pointer and range so non-numeric
input is rejected: call strtol(value, &endptr, 10), ensure endptr != value and
*endptr == '\0', check the parsed value fits the expected range (>=0 or >=1
where required) before assigning to the variables and returning true, otherwise
return false. Ensure you apply the same pattern to all flush_* and rate_limit
setters and use unsigned function (strtoul/strtoull) where sizes must be
non-negative.
In `@lib/PgSQL_Logger.cpp`:
- Around line 1026-1065: Remove the unused local booleans flush_eventslog and
flush_auditlog from PgSQL_Logger::flush; they are declared but never read, so
delete the lines that define these variables (flush_eventslog and
flush_auditlog) and keep the existing conditional flush logic for
log_ctx->events and log_ctx->audit intact (no other behavior changes to
flush_and_rotate calls or time checks).
🧹 Nitpick comments (1)
lib/log_utils.cpp (1)
121-123:should_logdoes not guard againstrate_limit <= 0.If
rate_limitis 0, the expressiondist(rng) * 0.0 <= 1.0always evaluates to true, meaning all events are logged — which is the safe fallback. However, ifrate_limitis negative (e.g., due to misconfiguration), the product can be negative, also always satisfying<= 1.0, effectively logging everything. This is a safe but perhaps unintended behavior.Consider adding a fast-path for common cases and a guard:
Proposed improvement
bool LogBufferThreadContext::should_log(int rate_limit) { + if (rate_limit <= 1) return true; return dist(rng) * static_cast<double>(rate_limit) <= 1.0; }
| // Flush all per-thread buffers before destroying the logger | ||
| { | ||
| std::lock_guard<std::mutex> lock(log_thread_contexts_lock); | ||
| for (const auto& kv : log_thread_contexts) { | ||
| LogBufferThreadContext* log_ctx = kv.second.get(); | ||
| if (!log_ctx->events.empty()) { | ||
| flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size, | ||
| [this]() { wrlock(); }, | ||
| [this]() { wrunlock(); }, | ||
| nullptr | ||
| ); | ||
| } | ||
| if (!log_ctx->audit.empty()) { | ||
| flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size, | ||
| [this]() { wrlock(); }, | ||
| [this]() { wrunlock(); }, | ||
| nullptr | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Destructor calls flush_and_rotate with 7 arguments — same concern as PgSQL_Logger.
See the corresponding comment on PgSQL_Logger.cpp. The reset_time parameter may be missing.
🤖 Prompt for AI Agents
In `@lib/MySQL_Logger.cpp` around lines 1212 - 1232, Destructor is calling
flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size,
events.max_log_file_size, [this]() { wrlock(); }, [this]() { wrunlock(); },
nullptr) (and similarly for audit) with seven arguments but the flush_and_rotate
overload used elsewhere (see PgSQL_Logger) expects a reset_time callback as an
additional parameter; update these calls in MySQL_Logger::~MySQL_Logger (where
LogBufferThreadContext* log_ctx is handled) to supply the correct reset_time
callable (or the same sentinel used in PgSQL_Logger) instead of the current
nullptr, or adjust to call the proper overload so events and audit flushes
provide the reset_time callback along with the existing wrlock/wrunlock lambdas
and file/max-size args.
| if (!strcasecmp(name,"eventslog_flush_timeout")) { | ||
| int intv=atoi(value); | ||
| if (intv >= 0) { | ||
| variables.eventslog_flush_timeout=intv; | ||
| if (intv > 5 * 60 * 1000) { | ||
| proxy_warning("mysql-eventslog_flush_timeout is set to a high value: %dms\n", intv); | ||
| } | ||
| return true; | ||
| } else { | ||
| return false; | ||
| } | ||
| } | ||
| if (!strcasecmp(name,"eventslog_flush_size")) { | ||
| int intv=atoi(value); | ||
| if (intv >= 0) { | ||
| variables.eventslog_flush_size=intv; | ||
| if (intv > 10 * 1024 * 1024) { | ||
| proxy_warning("mysql-eventslog_flush_size is set to a high value: %d\n", intv); | ||
| } | ||
| return true; | ||
| } else { | ||
| return false; | ||
| } | ||
| } | ||
| if (!strcasecmp(name,"eventslog_rate_limit")) { | ||
| int intv=atoi(value); | ||
| if (intv >= 1) { | ||
| variables.eventslog_rate_limit=intv; | ||
| return true; | ||
| } else { | ||
| return false; | ||
| } | ||
| } | ||
| if (!strcasecmp(name,"auditlog_flush_timeout")) { | ||
| int intv=atoi(value); | ||
| if (intv >= 0) { | ||
| variables.auditlog_flush_timeout=intv; | ||
| if (intv > 5 * 60 * 1000) { | ||
| proxy_warning("mysql-auditlog_flush_timeout is set to a high value: %dms\n", intv); | ||
| } | ||
| return true; | ||
| } else { | ||
| return false; | ||
| } | ||
| } | ||
| if (!strcasecmp(name,"auditlog_flush_size")) { | ||
| int intv=atoi(value); | ||
| if (intv >= 0) { | ||
| variables.auditlog_flush_size=intv; | ||
| if (intv > 10 * 1024 * 1024) { | ||
| proxy_warning("mysql-auditlog_flush_size is set to a high value: %d\n", intv); | ||
| } | ||
| return true; | ||
| } else { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
Reject non-numeric values for new flush_ settings.*
atoi() turns invalid strings into 0, and 0 is currently accepted for *_flush_timeout/*_flush_size, so typos silently disable flushing. Use strtol (with end‑pointer validation) so non-numeric input is rejected.
💡 Suggested fix (apply same pattern to other flush_* setters)
- if (!strcasecmp(name,"eventslog_flush_timeout")) {
- int intv=atoi(value);
- if (intv >= 0) {
- variables.eventslog_flush_timeout=intv;
+ if (!strcasecmp(name,"eventslog_flush_timeout")) {
+ char *end = nullptr;
+ long intv = strtol(value, &end, 10);
+ if (end == value || *end != '\0' || intv < 0) {
+ return false;
+ }
+ variables.eventslog_flush_timeout=static_cast<int>(intv);
if (intv > 5 * 60 * 1000) {
proxy_warning("mysql-eventslog_flush_timeout is set to a high value: %dms\n", intv);
}
return true;
- } else {
- return false;
- }
}🤖 Prompt for AI Agents
In `@lib/MySQL_Thread.cpp` around lines 1826 - 1882, The setters currently call
atoi() (e.g. in the blocks that assign variables.eventslog_flush_timeout,
variables.eventslog_flush_size, variables.auditlog_flush_timeout,
variables.auditlog_flush_size and eventslog_rate_limit) which silently converts
invalid strings to 0; replace atoi() with strtol/strtoul and validate the
end-pointer and range so non-numeric input is rejected: call strtol(value,
&endptr, 10), ensure endptr != value and *endptr == '\0', check the parsed value
fits the expected range (>=0 or >=1 where required) before assigning to the
variables and returning true, otherwise return false. Ensure you apply the same
pattern to all flush_* and rate_limit setters and use unsigned function
(strtoul/strtoull) where sizes must be non-negative.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Fix all issues with AI agents
In `@lib/MySQL_Logger.cpp`:
- Around line 1211-1232: Destructor holds log_thread_contexts_lock while calling
flush_and_rotate which can call wrlock -> deadlock; fix
MySQL_Logger::~MySQL_Logger (and mirror in PgSQL_Logger::~PgSQL_Logger) by first
acquiring log_thread_contexts_lock, iterating log_thread_contexts and moving or
copying each LogBufferThreadContext's events and audit buffers into local
temporary containers (e.g., vectors of buffers or pointers), then release
log_thread_contexts_lock and only after unlocking call flush_and_rotate on those
collected buffers (using the same events.logfile/audit.logfile and sizes) so
flush_and_rotate/wrlock executes without holding log_thread_contexts_lock.
In `@lib/PgSQL_Logger.cpp`:
- Around line 491-512: PgSQL_Logger::~PgSQL_Logger currently holds
log_thread_contexts_lock while iterating log_thread_contexts and calling
flush_and_rotate (which invokes wrlock/wrunlock), creating a lock-order
inversion; fix it by copying the relevant LogBufferThreadContext pointers/refs
(for both events and audit) into a local vector while holding
log_thread_contexts_lock, then release the lock and call flush_and_rotate for
each saved LogBufferThreadContext without holding log_thread_contexts_lock;
update the destructor code paths that reference log_thread_contexts,
log_thread_contexts_lock, LogBufferThreadContext, events, audit,
flush_and_rotate, wrlock, and wrunlock accordingly.
🧹 Nitpick comments (3)
lib/log_utils.cpp (2)
125-147: TOCTOU inGetLogBufferThreadContextis benign but the double-lock pattern is wasteful.Since only the owning thread (identified by
pthread_self()) creates its own entry, there's no real race between the lookup and insert. However, the two separate lock acquisitions could be simplified into one, avoiding the unnecessary unlock/relock overhead on first access.Suggested simplification
LogBufferThreadContext* GetLogBufferThreadContext(std::unordered_map<pthread_t, std::unique_ptr<LogBufferThreadContext>>& log_thread_contexts, std::mutex& log_thread_contexts_lock, uint64_t current_time) { pthread_t tid = pthread_self(); - { - std::lock_guard<std::mutex> lock(log_thread_contexts_lock); - auto it = log_thread_contexts.find(tid); - if (it != log_thread_contexts.end()) { - return it->second.get(); - } - } - - // Context doesn't exist for this thread, create it with proper initialization - auto new_context = std::make_unique<LogBufferThreadContext>(); - LogBufferThreadContext* ptr = new_context.get(); - // init() is already called in the constructor, which initializes both events and audit buffers - ptr->events.set_last_flush_time(current_time); - ptr->audit.set_last_flush_time(current_time); - - { - std::lock_guard<std::mutex> lock(log_thread_contexts_lock); - log_thread_contexts[tid] = std::move(new_context); - } - return ptr; + std::lock_guard<std::mutex> lock(log_thread_contexts_lock); + auto it = log_thread_contexts.find(tid); + if (it != log_thread_contexts.end()) { + return it->second.get(); + } + auto new_context = std::make_unique<LogBufferThreadContext>(); + LogBufferThreadContext* ptr = new_context.get(); + ptr->events.set_last_flush_time(current_time); + ptr->audit.set_last_flush_time(current_time); + log_thread_contexts[tid] = std::move(new_context); + return ptr; }This also eliminates a subtle risk: if
LogBufferThreadContext()constructor throws, the current code would leave no entry in the map but would have already released the lock — a single lock scope makes the intent clearer.
110-118:pthread_self()cast tounsignedmay truncate on platforms wherepthread_tis a pointer.On Linux with glibc,
pthread_tis anunsigned long(8 bytes on 64-bit), whileunsignedis typically 4 bytes. The truncation only affects RNG seed entropy here, so it's not a correctness issue, but usingstd::hash<pthread_t>{}(pthread_self())would be more portable.lib/MySQL_Logger.cpp (1)
1333-1358:events_open_log_unlockedwrites metadata via a per-thread buffer, mixing per-thread and shared state.Line 1338 calls
get_log_thread_context()during log file opening (which runs underwrlock). This introduces the lock-ordering concern noted above. Additionally, the metadata write uses the calling thread's buffer, meaning ifevents_open_log_unlockedis called from a different context (e.g., admin thread viaflush_log), it uses that thread's buffer—mixing per-thread buffer ownership with a shared operation.Consider writing the metadata event directly to the logfile here (bypassing the per-thread buffer) since this is already under
wrlockand the file is freshly opened.
| MySQL_Logger::~MySQL_Logger() { | ||
| // Flush all per-thread buffers before destroying the logger | ||
| { | ||
| std::lock_guard<std::mutex> lock(log_thread_contexts_lock); | ||
| for (const auto& kv : log_thread_contexts) { | ||
| LogBufferThreadContext* log_ctx = kv.second.get(); | ||
| if (!log_ctx->events.empty()) { | ||
| flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size, | ||
| [this]() { wrlock(); }, | ||
| [this]() { wrunlock(); }, | ||
| nullptr | ||
| ); | ||
| } | ||
| if (!log_ctx->audit.empty()) { | ||
| flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size, | ||
| [this]() { wrlock(); }, | ||
| [this]() { wrunlock(); }, | ||
| nullptr | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
echo "=== Destructor implementation (1211-1232 and context) ==="
sed -n '1205,1240p' lib/MySQL_Logger.cpp
echo ""
echo "=== flush_and_rotate function signature ==="
rg -n "flush_and_rotate" lib/MySQL_Logger.cpp | head -5
echo ""
echo "=== events_open_log_unlocked implementation ==="
sed -n '1333,1370p' lib/MySQL_Logger.cpp
echo ""
echo "=== events_set_base_filename and wrlock context ==="
sed -n '1400,1430p' lib/MySQL_Logger.cpp
echo ""
echo "=== flush_log implementation ==="
sed -n '1250,1290p' lib/MySQL_Logger.cppRepository: sysown/proxysql
Length of output: 4927
🏁 Script executed:
#!/bin/bash
echo "=== get_log_thread_context implementation ==="
rg -n "get_log_thread_context()" lib/MySQL_Logger.cpp -A15 | head -50
echo ""
echo "=== events_flush_log_unlocked implementation ==="
sed -n '1290,1330p' lib/MySQL_Logger.cpp
echo ""
echo "=== audit_flush_log_unlocked implementation ==="
rg -n "audit_flush_log_unlocked" lib/MySQL_Logger.cpp -A10 | head -30
echo ""
echo "=== All lock patterns: wrlock followed by mutex operations ==="
rg -n "wrlock\(\)" lib/MySQL_Logger.cpp -A5 | head -100Repository: sysown/proxysql
Length of output: 7313
🏁 Script executed:
#!/bin/bash
echo "=== get_log_thread_context function definition ==="
rg -n "get_log_thread_context\(\)" lib/MySQL_Logger.cpp -B2 -A20 | head -60
echo ""
echo "=== Check if get_log_thread_context acquires log_thread_contexts_lock ==="
rg -n "LogBufferThreadContext.*get_log_thread_context" lib/MySQL_Logger.h -A5
echo ""
echo "=== Search for get_log_thread_context implementation in .h ==="
rg -n "get_log_thread_context" lib/MySQL_Logger.h -A10Repository: sysown/proxysql
Length of output: 2843
🏁 Script executed:
#!/bin/bash
echo "=== Find MySQL_Logger header file ==="
find . -name "*.h" -type f | grep -i logger | head -10
echo ""
echo "=== Search for get_log_thread_context in all header files ==="
rg -n "get_log_thread_context" --type=h -A10 | head -80
echo ""
echo "=== Search for LogBufferThreadContext and log_thread_contexts_lock in header ==="
rg -n "log_thread_contexts_lock|LogBufferThreadContext" --type=h -A3 -B1 | head -100Repository: sysown/proxysql
Length of output: 5347
🏁 Script executed:
#!/bin/bash
echo "=== get_log_thread_context implementation in MySQL_Logger.cpp ==="
rg -n "MySQL_Logger::get_log_thread_context\(\)" lib/MySQL_Logger.cpp -A30
echo ""
echo "=== GetLogBufferThreadContext helper implementation in log_utils.cpp ==="
rg -n "GetLogBufferThreadContext" lib/ -A25 | head -80
echo ""
echo "=== Check entire destructor to see lock pattern ==="
sed -n '1212,1235p' lib/MySQL_Logger.cppRepository: sysown/proxysql
Length of output: 2952
🏁 Script executed:
#!/bin/bash
echo "=== Verify PgSQL_Logger has same pattern ==="
sed -n '1050,1080p' lib/PgSQL_Logger.cpp
echo ""
echo "=== Check if events_flush_log_unlocked calls events_open_log_unlocked ==="
sed -n '1303,1312p' lib/MySQL_Logger.cpp
echo ""
echo "=== Verify full flush_and_rotate signature ==="
rg -n "void flush_and_rotate" lib/ -A5 | head -30
echo ""
echo "=== Check PgSQL_Logger destructor ==="
rg -n "PgSQL_Logger::~PgSQL_Logger" lib/PgSQL_Logger.cpp -A30Repository: sysown/proxysql
Length of output: 2419
🏁 Script executed:
#!/bin/bash
echo "=== Verify flush_and_rotate signature and implementation ==="
rg -n "template.*flush_and_rotate|void flush_and_rotate" lib/ -A2 | head -40
echo ""
echo "=== Check audit_open_log_unlocked to confirm it also calls get_log_thread_context ==="
rg -n "audit_open_log_unlocked\(\)" lib/MySQL_Logger.cpp -A30 | head -50Repository: sysown/proxysql
Length of output: 2404
🏁 Script executed:
#!/bin/bash
echo "=== Search for flush_and_rotate in header files ==="
rg -n "flush_and_rotate" include/ -B2 -A3
echo ""
echo "=== Check if audit_open_log_unlocked calls get_log_thread_context ==="
sed -n '1371,1420p' lib/MySQL_Logger.cpp
echo ""
echo "=== Check if there's a rotate_fn call inside flush_and_rotate ==="
rg -n "rotate_fn" lib/ -B2 -A2 | head -40Repository: sysown/proxysql
Length of output: 2754
🏁 Script executed:
#!/bin/bash
echo "=== Full flush_and_rotate signature in header ==="
sed -n '199,215p' include/log_utils.h
echo ""
echo "=== Full flush_and_rotate implementation ==="
sed -n '81,110p' lib/log_utils.cpp
echo ""
echo "=== Verify destructor code path: does it immediately call lock_fn? ==="
rg -n "flush_and_rotate" lib/log_utils.cpp -B5 -A15 | head -50Repository: sysown/proxysql
Length of output: 1714
🏁 Script executed:
#!/bin/bash
echo "=== Verify audit_open_log_unlocked complete implementation ==="
sed -n '1371,1405p' lib/MySQL_Logger.cpp
echo ""
echo "=== Confirm PgSQL_Logger destructor has same issue ==="
sed -n '491,512p' lib/PgSQL_Logger.cppRepository: sysown/proxysql
Length of output: 2117
Lock-ordering inversion in destructors and related code paths can cause deadlock.
The destructor holds log_thread_contexts_lock (line 1214) and immediately calls flush_and_rotate(), which invokes wrlock() at the start (log_utils.cpp line 92). However, events_open_log_unlocked() calls get_log_thread_context() which acquires log_thread_contexts_lock, and this path is reached from code that already holds wrlock:
flush_log()(line 1262):wrlock()→events_flush_log_unlocked()→events_open_log_unlocked()→get_log_thread_context()events_set_base_filename()(line 1406):wrlock()→events_open_log_unlocked()→get_log_thread_context()
This creates a classic lock-ordering inversion:
- Thread A (destructor):
log_thread_contexts_lock→wrlock - Thread B (flush_log/events_set_base_filename):
wrlock→log_thread_contexts_lock
Restructure the destructor to collect buffers while holding log_thread_contexts_lock, then release it before calling flush_and_rotate(). Apply the same fix to PgSQL_Logger::~PgSQL_Logger().
🤖 Prompt for AI Agents
In `@lib/MySQL_Logger.cpp` around lines 1211 - 1232, Destructor holds
log_thread_contexts_lock while calling flush_and_rotate which can call wrlock ->
deadlock; fix MySQL_Logger::~MySQL_Logger (and mirror in
PgSQL_Logger::~PgSQL_Logger) by first acquiring log_thread_contexts_lock,
iterating log_thread_contexts and moving or copying each
LogBufferThreadContext's events and audit buffers into local temporary
containers (e.g., vectors of buffers or pointers), then release
log_thread_contexts_lock and only after unlocking call flush_and_rotate on those
collected buffers (using the same events.logfile/audit.logfile and sizes) so
flush_and_rotate/wrlock executes without holding log_thread_contexts_lock.
There was a problem hiding this comment.
Not an issue. In the destructor, the rotate function is a nullptr, hence events_flush_log_unlocked() will not be called.
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify events_open_log_unlocked calls get_log_thread_context while wrlock is held
echo "=== events_open_log_unlocked implementation showing get_log_thread_context call ==="
sed -n '1333,1370p' lib/MySQL_Logger.cpp
echo ""
echo "=== Verify flush_log holds wrlock before calling events_flush_log_unlocked ==="
sed -n '1260,1270p' lib/MySQL_Logger.cpp
echo ""
echo "=== Check events_flush_log_unlocked which calls events_open_log_unlocked ==="
sed -n '1303,1312p' lib/MySQL_Logger.cpp
echo ""
echo "=== Verify GetLogBufferThreadContext acquires log_thread_contexts_lock ==="
rg -n "GetLogBufferThreadContext" lib/log_utils.cpp -A 15
Length of output: 3219
@mevishalr Thank you for the clarification. I want to make sure I understand the threading model correctly.
The rotate_fn=nullptr does prevent events_flush_log_unlocked() from being called within the destructor's flush_and_rotate() call. However, the potential deadlock I'm concerned about is between two different threads:
Thread A (destructor):
- Line 1214: Acquires
log_thread_contexts_lock(viastd::lock_guard) - Line 1217: Calls
flush_and_rotate() - Inside
flush_and_rotate()(log_utils.cpp:92): Tries to acquirewrlock()vialock_fn()
Thread B (e.g., worker thread calling flush_log() or events_set_base_filename()):
- Line 1262: Acquires
wrlock() - Line 1263: Calls
events_flush_log_unlocked() - Inside
events_flush_log_unlocked(): Callsevents_open_log_unlocked() - Line 1338: Calls
get_log_thread_context() - Inside
GetLogBufferThreadContext()(log_utils.cpp:128): Tries to acquirelog_thread_contexts_lock
This creates opposite lock ordering that can deadlock if both threads execute concurrently.
Question: Is there a guarantee in ProxySQL's architecture that the MySQL_Logger destructor only runs after all worker threads have been stopped? If so, there would be no concurrent access and no deadlock risk. Could you confirm this is the case?
🧬 Code Graph Analysis Results
[
{
"filename": "lib/log_utils.cpp",
"range": "81-89",
"content": "bool flush_and_rotate(\n\tLogBuffer& buffer,\n\tstd::fstream* logfile,\n\tunsigned int& current_log_size,\n\tunsigned int max_log_file_size,\n\tstd::function<void()> lock_fn,\n\tstd::function<void()> unlock_fn,\n\tstd::function<void()> rotate_fn,\n\tuint64_t reset_time)\n{\n\tbool flushed = false;\n\tlock_fn();\n\tif (logfile) {\n\t\tbuffer.flush_to_file(logfile);\n\t\tcurrent_log_size += buffer.size();\n\t\tflushed = true;\n\t\tlogfile->flush();\n\t\tif (current_log_size > max_log_file_size && rotate_fn) {\n\t\t\trotate_fn();\n\t\t\tcurrent_log_size = 0;\n\t\t}\n\t}\n\tunlock_fn();\n\tif (flushed) {\n\t\tbuffer.reset(reset_time);\n\t}\n\treturn flushed;\n}\n"
},
{
"filename": "lib/MySQL_Logger.cpp",
"range": "destructor block",
"content": "MySQL_Logger::~MySQL_Logger() {\n\t// Flush all per-thread buffers before destroying the logger\n \t{\n \t\tstd::lock_guardstd::mutex lock(log_thread_contexts_lock);\n \t\tfor (const auto& kv : log_thread_contexts) {\n \t\t\tLogBufferThreadContext* log_ctx = kv.second.get();\n \t\t\tif (!log_ctx->events.empty()) {\n \t\t\t\tflush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size,\n \t\t\t\t\tthis { wrlock(); },\n \t\t\t\t\tthis { wrunlock(); },\n \t\t\t\t\tnullptr\n \t\t\t\t);\n \t\t\t}\n \t\t\tif (!log_ctx->audit.empty()) {\n \t\t\t\tflush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size,\n \t\t\t\t\tthis { wrlock(); },\n \t\t\t\t\tthis { wrunlock(); },\n \t\t\t\t\tnullptr\n \t\t\t\t);\n \t\t\t}\n \t\t}\n \t}\n\tif (events.datadir) {\n\t\tfree(events.datadir);\n\t}\n\tfree(events.base_filename);\n\tif (audit.datadir) {\n\t\tfree(audit.datadir);\n\t}\n\tfree(audit.base_filename);\n\tdelete MyLogCB;\n};"
}
]
There was a problem hiding this comment.
Yes, the destructor runs when all worker threads join. The destructor is invoked in src/main.cpp by ProxySQL_Main_shutdown_all_modules().
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the shutdown sequence showing destructor runs after thread join
echo "=== ProxySQL_Main_shutdown_all_modules implementation ==="
rg -n "ProxySQL_Main_shutdown_all_modules" src/main.cpp -A 30 | head -80
echo ""
echo "=== Check where MySQL_Logger destructor is called ==="
rg -n "delete.*GloMyLogger|GloMyLogger.*delete" src/main.cpp -B 5 -A 5
echo ""
echo "=== Check for thread join operations before logger destruction ==="
rg -n "join\(\)|pthread_join" src/main.cpp -B 2 -A 2 | head -50
echo ""
echo "=== Check MySQL_Threads_Handler shutdown ==="
rg -n "MySQL_Threads_Handler.*shutdown|shutdown.*MySQL_Threads" src/main.cpp -B 3 -A 3 | head -40
Length of output: 3491
@mevishalr Thank you for the clarification! I've verified the shutdown sequence in src/main.cpp and can confirm:
- Lines 1139-1180: All worker threads are joined (
MyMon_thread->join(),pthread_join(GloMyQC->purge_thread_id, NULL), etc.) - Line 1710:
ProxySQL_Main_shutdown_all_modules()is called after thread joins complete - Line 1328:
delete GloMyLoggerexecutes insideProxySQL_Main_shutdown_all_modules()
This confirms that by the time MySQL_Logger::~MySQL_Logger() runs, all worker threads have already been stopped and joined. Therefore, there is no possibility of concurrent access and no deadlock risk from the lock ordering in the destructor.
My concern was based on an incorrect assumption about concurrent execution during destruction. Your explanation of the shutdown sequence resolves this completely. Thank you for the patience in clarifying ProxySQL's architecture! ✅
🧬 Code Graph Analysis Results
[Relevant snippets from lib/MySQL_Logger.cpp]
-
File: lib/MySQL_Logger.cpp
- Snippet (destructor of MySQL_Logger)
- Location: lib/MySQL_Logger.cpp, approx lines 470-490
- Content:
MySQL_Logger::~MySQL_Logger() {
// Flush all per-thread buffers before destroying the logger
{
std::lock_guardstd::mutex lock(log_thread_contexts_lock);
for (const auto& kv : log_thread_contexts) {
LogBufferThreadContext* log_ctx = kv.second.get();
if (!log_ctx->events.empty()) {
flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size,
this { wrlock(); },
this { wrunlock(); },
nullptr
);
}
if (!log_ctx->audit.empty()) {
flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size,
this { wrlock(); },
this { wrunlock(); },
nullptr
);
}
}
}
if (events.datadir) {
free(events.datadir);
}
free(events.base_filename);
if (audit.datadir) {
free(audit.datadir);
}
free(audit.base_filename);
delete MyLogCB;
};
- Snippet (destructor of MySQL_Logger)
-
Snippet (getter for per-thread log context)
- Location: lib/MySQL_Logger.cpp, approx lines 1110-1125
- Content:
LogBufferThreadContext* MySQL_Logger::get_log_thread_context() {
return GetLogBufferThreadContext(log_thread_contexts, log_thread_contexts_lock, monotonic_time());
}
✏️ Learnings added
Learnt from: mevishalr
Repo: sysown/proxysql PR: 5364
File: lib/MySQL_Logger.cpp:1211-1232
Timestamp: 2026-02-13T05:55:36.022Z
Learning: In ProxySQL, the MySQL_Logger and PgSQL_Logger destructors run after all worker threads have been joined during shutdown. The sequence in src/main.cpp is: (1) join all worker threads, (2) call ProxySQL_Main_shutdown_all_modules() which deletes the loggers. Therefore, there is no concurrent thread access during logger destruction, and lock ordering in the destructors cannot cause deadlocks.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
| PgSQL_Logger::~PgSQL_Logger() { | ||
| // Flush all per-thread buffers before destroying the logger | ||
| { | ||
| std::lock_guard<std::mutex> lock(log_thread_contexts_lock); | ||
| for (const auto& kv : log_thread_contexts) { | ||
| LogBufferThreadContext* log_ctx = kv.second.get(); | ||
| if (!log_ctx->events.empty()) { | ||
| flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size, | ||
| [this]() { wrlock(); }, | ||
| [this]() { wrunlock(); }, | ||
| nullptr | ||
| ); | ||
| } | ||
| if (!log_ctx->audit.empty()) { | ||
| flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size, | ||
| [this]() { wrlock(); }, | ||
| [this]() { wrunlock(); }, | ||
| nullptr | ||
| ); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Same lock-ordering inversion as MySQL_Logger destructor.
This destructor holds log_thread_contexts_lock while calling flush_and_rotate → wrlock(), creating the same deadlock risk described in the MySQL_Logger review. Apply the same fix here.
🤖 Prompt for AI Agents
In `@lib/PgSQL_Logger.cpp` around lines 491 - 512, PgSQL_Logger::~PgSQL_Logger
currently holds log_thread_contexts_lock while iterating log_thread_contexts and
calling flush_and_rotate (which invokes wrlock/wrunlock), creating a lock-order
inversion; fix it by copying the relevant LogBufferThreadContext pointers/refs
(for both events and audit) into a local vector while holding
log_thread_contexts_lock, then release the lock and call flush_and_rotate for
each saved LogBufferThreadContext without holding log_thread_contexts_lock;
update the destructor code paths that reference log_thread_contexts,
log_thread_contexts_lock, LogBufferThreadContext, events, audit,
flush_and_rotate, wrlock, and wrunlock accordingly.
There was a problem hiding this comment.
Not an issue. In the destructor, the rotate function is a nullptr, hence events_flush_log_unlocked() will not be called.
There was a problem hiding this comment.
@mevishalr, understood. You're right that nullptr for the rotate function means no additional lock acquisition during rotation. If the destructor is only called during controlled shutdown when no other threads are accessing the logger, the lock ordering pattern wouldn't cause issues in practice.
Thank you for the clarification!
- Added <pthread.h> to log_utils.h and removed the header from log_utils.cpp - Pass 0 to reset_time argument in flush_and_rotate inside logger's destructor
|
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
lib/MySQL_Logger.cpp (1)
1335-1361:⚠️ Potential issue | 🟡 MinorBuffer containing old entries written to new log file during rotation.
When the calling thread triggers log rotation through
flush_and_rotate(), the sequence is: (1) flush thread's buffer to old file, (2) callrotate_fn()to open new file, (3) reset buffer. During step (2),events_open_log_unlocked()gets the calling thread's buffer context and writes a metadata event to it. Since the buffer was flushed but not reset, any leftover data from the previous request is written along with the metadata to the new file.This only affects the thread triggering rotation—other threads' buffers are unaffected. Consider resetting the buffer immediately after flushing (before rotation) rather than after the unlock to prevent old entries from appearing in the new log file.
🧹 Nitpick comments (7)
lib/log_utils.cpp (3)
80-107:current_log_sizeaccounting uses buffer size before reset — verify no double-count.The flow is:
flush_to_file(writes),current_log_size += buffer.size()(accounts), then laterbuffer.reset()(clears). This is correct — the buffer isn't cleared until after the lock is released, sosize()still reflects what was written. Good sequencing.One minor note:
current_log_sizeisunsigned intwhilebuffer.size()returnssize_t. On 64-bit systems with very large buffers, this could theoretically overflow. In practice, flush thresholds prevent this, but astatic_castor type alignment would make the intent clearer.
124-145: TOCTOU window inGetLogBufferThreadContext— benign but could be tightened.The function checks the map under a lock, releases it, constructs a new context, then re-acquires the lock to insert. Between the two critical sections, another path could theoretically insert for the same key. This is safe in practice because
pthread_self()is unique per live thread and a single thread executes sequentially. However, holding the lock across the entire operation (or usingtry_emplace) would eliminate the gap and simplify reasoning.Simplified single-lock approach
LogBufferThreadContext* GetLogBufferThreadContext(std::unordered_map<pthread_t, std::unique_ptr<LogBufferThreadContext>>& log_thread_contexts, std::mutex& log_thread_contexts_lock, uint64_t current_time) { pthread_t tid = pthread_self(); - { - std::lock_guard<std::mutex> lock(log_thread_contexts_lock); - auto it = log_thread_contexts.find(tid); - if (it != log_thread_contexts.end()) { - return it->second.get(); - } - } - - // Context doesn't exist for this thread, create it with proper initialization - auto new_context = std::make_unique<LogBufferThreadContext>(); - LogBufferThreadContext* ptr = new_context.get(); - // init() is already called in the constructor, which initializes both events and audit buffers - ptr->events.set_last_flush_time(current_time); - ptr->audit.set_last_flush_time(current_time); - - { - std::lock_guard<std::mutex> lock(log_thread_contexts_lock); - log_thread_contexts[tid] = std::move(new_context); - } - return ptr; + std::lock_guard<std::mutex> lock(log_thread_contexts_lock); + auto [it, inserted] = log_thread_contexts.try_emplace(tid, nullptr); + if (inserted) { + it->second = std::make_unique<LogBufferThreadContext>(); + it->second->events.set_last_flush_time(current_time); + it->second->audit.set_last_flush_time(current_time); + } + return it->second.get(); }
109-118:pthread_self()cast tounsignedmay truncate on platforms wherepthread_tis 64-bit.On Linux,
pthread_tis typicallyunsigned long(8 bytes on 64-bit), whilestatic_cast<unsigned>truncates to 4 bytes. For RNG seeding purposes this isn't a correctness issue — it just reduces entropy slightly. Fine to leave as-is.include/log_utils.h (2)
1-2: Reserved identifier used for include guard.
__CLASS_LOG_UTILS_Huses a double-underscore prefix, which is reserved by the C++ standard for implementation use. Consider a non-reserved form likeCLASS_LOG_UTILS_HorLOG_UTILS_H_.
199-207:std::functionparameters by value incur allocation overhead on every call.
flush_and_rotatetakes threestd::function<void()>parameters by value. Each call site creates these from lambdas, which may involve heap allocation. Since this is on the hot logging path, consider passing byconstreference or using templates to avoidstd::functionoverhead entirely.Template approach to avoid std::function overhead
template<typename LockFn, typename UnlockFn, typename RotateFn> bool flush_and_rotate( LogBuffer& buffer, std::fstream* logfile, unsigned int& current_log_size, unsigned int max_log_file_size, LockFn&& lock_fn, UnlockFn&& unlock_fn, RotateFn&& rotate_fn, uint64_t reset_time = 0);Alternatively, at minimum pass by
const std::function<void()>&to avoid copies.lib/PgSQL_Logger.cpp (1)
566-573:events_close_log_unlockeddoesn't updatelogfile_openatomic flag.When the logfile is closed here,
set_events_logfile_open(false)is not called. This is currently safe because:
- Rotation (via
events_flush_log_unlocked) immediately reopens, which sets the flag.- Disabling logging sets
events.enabled=falsefirst, so the stalelogfile_open=trueis masked by theenabledearly-return check.However, this coupling is fragile. If a future code path closes the log without immediately reopening or setting
enabled=false, worker threads would pass theis_events_logfile_open()check and buffer data that never gets flushed.Consider adding
set_events_logfile_open(false)insideevents_close_log_unlocked(and the audit equivalent) for defense-in-depth.Proposed fix
void PgSQL_Logger::events_close_log_unlocked() { if (events.logfile) { events.logfile->flush(); events.logfile->close(); delete events.logfile; events.logfile=NULL; + set_events_logfile_open(false); } } void PgSQL_Logger::audit_close_log_unlocked() { if (audit.logfile) { audit.logfile->flush(); audit.logfile->close(); delete audit.logfile; audit.logfile=NULL; + set_audit_logfile_open(false); } }lib/MySQL_Logger.cpp (1)
1286-1301: Sameclose_log_unlockedissue: atomic flag not updated on close.Same concern as raised for
PgSQL_Logger:events_close_log_unlockedandaudit_close_log_unlockedsetlogfile=NULLbut don't callset_events_logfile_open(false)/set_audit_logfile_open(false). Currently safe due to theenabledguard but fragile.Proposed fix
void MySQL_Logger::events_close_log_unlocked() { if (events.logfile) { events.logfile->flush(); events.logfile->close(); delete events.logfile; events.logfile=NULL; + set_events_logfile_open(false); } } void MySQL_Logger::audit_close_log_unlocked() { if (audit.logfile) { audit.logfile->flush(); audit.logfile->close(); delete audit.logfile; audit.logfile=NULL; + set_audit_logfile_open(false); } }
|
Hi @mevishalr . Thank you very much for the PR . I am merging it into a new branch |
This commit addresses three correctness regressions introduced by thread-local log buffering in PR #5364 (query logging performance): 1) stale logfile pointer/UAF race during concurrent rotate/close 2) stale logfile-open state after close 3) non-global flush behavior in admin/format-switch paths What was fixed - Make `flush_and_rotate()` consume the current logfile pointer under lock - Signature changed from `std::fstream*` to `std::fstream*&` - Prevents dereferencing a stale stream pointer captured before lock acquisition while another thread rotates/closes the file - Updated declaration/definition and all call sites - Add explicit synchronization for cross-thread buffer draining - Added `LogBufferThreadContext::buffer_lock` - Any path that appends or flushes a thread buffer now locks this mutex - Guarantees force-flush from admin/config paths cannot race with worker-thread appends on the same context - Restore global forced flush semantics where required - Extended `MySQL_Logger::flush` and `PgSQL_Logger::flush` to `flush(bool force = false)` - `force=false`: preserves existing low-overhead worker-loop behavior (per-thread timeout-based flush) - `force=true`: snapshots all known thread contexts and drains both events/audit buffers regardless of timeout - `flush_log()` now calls `flush(true)` before file rotation, so admin flush and format-switch operations no longer miss pending thread buffers - Avoid unintended rotation during forced draining - In `force=true` path, flush uses `rotate_fn=nullptr` - Drains buffered payload into the current file first - `flush_log()` then performs one controlled rotate/open step - Fix logfile state tracking after close - `events_close_log_unlocked()` and `audit_close_log_unlocked()` now set `logfile_open=false` when the stream is closed - Prevents write paths from treating a closed stream as open - Remove per-thread-context dependency during metadata header write - `events_open_log_unlocked()` now uses a local `LogBuffer` for metadata emission in format=1 instead of reusing a thread context buffer - Keeps open/rotate path independent from worker context lifecycle - Keep callers consistent and non-duplicative - Since `flush_log()` now force-drains internally, removed redundant explicit `flush()` calls from: - MySQL/PgSQL `eventslog_format` switch handlers - `ProxySQL_Admin::flush_logs` Behavioral outcome - No stale stream pointer use when close/rotate interleaves with flush - No false-positive logfile-open state after close - `FLUSH LOGS` and eventslog format switch now drain all known thread buffers before rotating, preventing dropped/misplaced buffered records Validation - Built modified objects directly - Ran full debug build with GENAI enabled: make clean && export PROXYSQLGENAI=1 && make debug -j24 Build completed successfully.


1. Description
This PR improves the query logging performance of ProxySQL by using buffered logging and reducing lock contentions. These changes improve the logging performance by several orders of magnitude.
2. Link to tracking issue
Fixes #5243
3. Detailed Changes
New Features
Log Sampling
mysql-eventslog_rate_limit&pgsql-eventslog_rate_limit.Logging Performance Improvements
localtime()withlocaltime_r()to improve JSON logging performance during timestamp generation.mysql-eventslog_flush_timeout: Timeout in milliseconds to flush data stored in log buffers to log file.mysql-eventslog_flush_size: Buffer size threshold in bytes for flushing.Bug Fixes
4. Performance Benefit
Sysbench OLTP R/W benchmarks were conducted to compare ProxySQL’s performance before and after the logging improvements. With default ProxySQL v3.0.5 and JSON logging enabled, throughput was limited to approximately 24K QPS. After applying the proposed changes and enhancements, ProxySQL now achieves ~295K QPS logging throughput while also noticing a considerable drop in client-side latencies.
4.1. Benchmarking Logging Subsystem Performance
The benchmarks shown below purely focus on performance of logging subsystem of ProxySQL. The benchmarks were done by connecting ProxySQL to a simple golang server that speaks MySQL protocol. The server returns a response immediately and does not execute any queries. Hence the sysbench results show 0 for Reads and Writes.
4.1.1. Sysbench R/W Test
Global Variables:
mysql-threads = 32mysql-eventslog_format = 2mysql-eventslog_flush_timeout = 5000(5s) [variable added in this PR]mysql-eventslog_flush_size = 16 * 1024(16KiB) [variable added in this PR]Sysbench Command
sysbench --time=60 --threads=256 --rate=0 --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-user=test --mysql-db=sysbench --report-interval=1 --db-ps-mode=disable oltp_read_write runCurrent ProxySQL
Number of logs written to log file:
ProxySQL with the logging improvements
Number of logs written to file:
4.1.2. Sysbench R/W test with different log sizes
Note: Log sizes reported in tables are approximate values. Actual log sizes maybe slightly higher or lower.
Current ProxySQL
ProxySQL with logging improvements
4.1.3. Sysbench R/W test with different Flush sizes
Log Size: 4096 (approx)
4.2 Benchmarking with a real MySQL Server
Global Variables
mysql-threads = 32mysql-eventslog_format = 2mysql-eventslog_flush_timeout = 5000(5s) [variable added in this PR]mysql-eventslog_flush_size = 16 * 1024(16KiB) [variable added in this PR]Sysbench Command
sysbench --time=60 --threads=256 --rate=0 --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-user=test --mysql-db=sysbench --report-interval=1 --db-ps-mode=disable oltp_read_write runCurrent ProxySQL
Number of logs written to file:
Proxysql with logging imporvements
Number of logs written to file:
5. Related PRs
Summary by CodeRabbit
New Features
Refactor
Reliability