Skip to content

v3.0: harden query logging buffering follow-up (#5364 / #5243)#5389

Merged
renecannao merged 13 commits intov3.0from
v3.0-5243
Feb 20, 2026
Merged

v3.0: harden query logging buffering follow-up (#5364 / #5243)#5389
renecannao merged 13 commits intov3.0from
v3.0-5243

Conversation

@renecannao
Copy link
Contributor

@renecannao renecannao commented Feb 18, 2026

Summary

Follow-up to PR #5364 for issue #5243.

This hardens the buffered logging changes from #5364, focusing on concurrency safety and flush/rotation correctness.

What this changes

  • Fixes stale logfile pointer usage risk during concurrent rotate/close.
  • Fixes stale logfile-open state after close.
  • Restores forced global drain semantics before rotation (flush_log() now force-drains thread buffers).
  • Synchronizes cross-thread buffer draining with a per-thread-context mutex.
  • Keeps metadata header writes independent from worker thread contexts.

Validation

  • Built modified objects directly.

  • Ran full debug build with GENAI enabled:

    make clean && export PROXYSQLGENAI=1 && make debug -j24

    Build completed successfully.

References

Summary by CodeRabbit

  • New Features

    • Runtime-configurable flush timeout/size and rate-limit settings for event and audit logs.
    • Per-thread buffered logging with probabilistic sampling and per-protocol logging metrics.
  • Refactor

    • Logging migrated to a buffered per-thread system with improved flush/rotate behavior, safer shutdown flushing, and reduced IO contention.
  • Tests

    • Added regression test ensuring FLUSH LOGS does not drop in-flight query logs.

mevishalr and others added 5 commits February 11, 2026 11:23
Detailed Changes:
- Introduced LogBuffer and LogBufferThreadContext to implement per-thread buffering, reducing lock contention.
- Replaced localtime() with localtime_r() for improved performance and thread safety during timestamp generation.
- Implemented configurable sampling for event logs to reduce I/O overhead.
- Added global configuration variables to control flush size, flush interval, and sampling rate for event and audit logs.
- Added <pthread.h> to log_utils.h and removed the header from log_utils.cpp
- Pass 0 to reset_time argument in flush_and_rotate inside logger's destructor
This commit addresses three correctness regressions introduced by
thread-local log buffering in PR #5364 (query logging performance):

1) stale logfile pointer/UAF race during concurrent rotate/close
2) stale logfile-open state after close
3) non-global flush behavior in admin/format-switch paths

What was fixed

- Make `flush_and_rotate()` consume the current logfile pointer under lock
  - Signature changed from `std::fstream*` to `std::fstream*&`
  - Prevents dereferencing a stale stream pointer captured before lock
    acquisition while another thread rotates/closes the file
  - Updated declaration/definition and all call sites

- Add explicit synchronization for cross-thread buffer draining
  - Added `LogBufferThreadContext::buffer_lock`
  - Any path that appends or flushes a thread buffer now locks this mutex
  - Guarantees force-flush from admin/config paths cannot race with
    worker-thread appends on the same context

- Restore global forced flush semantics where required
  - Extended `MySQL_Logger::flush` and `PgSQL_Logger::flush` to
    `flush(bool force = false)`
  - `force=false`: preserves existing low-overhead worker-loop behavior
    (per-thread timeout-based flush)
  - `force=true`: snapshots all known thread contexts and drains both
    events/audit buffers regardless of timeout
  - `flush_log()` now calls `flush(true)` before file rotation, so admin
    flush and format-switch operations no longer miss pending thread buffers

- Avoid unintended rotation during forced draining
  - In `force=true` path, flush uses `rotate_fn=nullptr`
  - Drains buffered payload into the current file first
  - `flush_log()` then performs one controlled rotate/open step

- Fix logfile state tracking after close
  - `events_close_log_unlocked()` and `audit_close_log_unlocked()` now set
    `logfile_open=false` when the stream is closed
  - Prevents write paths from treating a closed stream as open

- Remove per-thread-context dependency during metadata header write
  - `events_open_log_unlocked()` now uses a local `LogBuffer` for metadata
    emission in format=1 instead of reusing a thread context buffer
  - Keeps open/rotate path independent from worker context lifecycle

- Keep callers consistent and non-duplicative
  - Since `flush_log()` now force-drains internally, removed redundant
    explicit `flush()` calls from:
    - MySQL/PgSQL `eventslog_format` switch handlers
    - `ProxySQL_Admin::flush_logs`

Behavioral outcome

- No stale stream pointer use when close/rotate interleaves with flush
- No false-positive logfile-open state after close
- `FLUSH LOGS` and eventslog format switch now drain all known thread
  buffers before rotating, preventing dropped/misplaced buffered records

Validation

- Built modified objects directly
- Ran full debug build with GENAI enabled:

  make clean && export PROXYSQLGENAI=1 && make debug -j24

  Build completed successfully.
@coderabbitai
Copy link

coderabbitai bot commented Feb 18, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Replaces direct std::fstream-based logging with per-thread in-memory LogBuffer and LogBufferThreadContext; converts Event write APIs to accept LogBuffer*; adds per-logfile open/size state, per-thread buffering, flush/rotate helpers, and runtime flush/size/rate configuration for MySQL and PgSQL.

Changes

Cohort / File(s) Summary
Logging primitives
include/log_utils.h, lib/log_utils.cpp
Add LogBuffer and LogBufferThreadContext, stream-like append/write APIs, flush_and_rotate() helper, and GetLogBufferThreadContext() for per-thread buffering, sampling, and flush-to-file logic.
MySQL logger surface & impl
include/MySQL_Logger.hpp, lib/MySQL_Logger.cpp
Change MySQL_Event write/auth/query signatures from std::fstream*LogBuffer*; add per-thread context accessor, current_log_size and logfile-open flags, per-thread buffer flush/rotate usage, Prometheus counter integration, and flush(bool force = false) API.
PgSQL logger surface & impl
include/PgSQL_Logger.hpp, lib/PgSQL_Logger.cpp
Mirror MySQL changes: convert PgSQL_Event write APIs to LogBuffer*, add per-thread contexts and logfile state (current_log_size, logfile_open), get_log_thread_context() and is_/set_ accessors, and flush(bool force = false).
Thread config & runtime wiring
include/MySQL_Thread.h, lib/MySQL_Thread.cpp, include/PgSQL_Thread.h, lib/PgSQL_Thread.cpp
Add runtime variables: eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, auditlog_flush_size; expose, validate, initialize, and refresh these variables.
Thread-local state
include/proxysql_structs.h
Add new __thread variables for per-thread audit/events flush_timeout, flush_size, and eventslog_rate_limit for both PostgreSQL and MySQL scopes.
Build
lib/Makefile
Add log_utils object to library build so the new utilities compile and link.
Event signatures & docs
include/MySQL_Logger.hpp, include/PgSQL_Logger.hpp, lib/MySQL_Logger.cpp, lib/PgSQL_Logger.cpp
Forward-declare LogBuffer/LogBufferThreadContext; update event write/query/auth signatures to LogBuffer*; update comments/docblocks to reflect LogBuffer usage and newline handling.
Metrics & tests
lib/ProxySQL_Admin.cpp, test/tap/tests/reg_test_5389-flush_logs_no_drop-t.cpp, test/tap/groups/groups.json
Add PgSQL metrics update call; add regression test exercising concurrent FLUSH LOGS against in-flight buffered logging and Prometheus query-logger metric consistency; register new test group.
Small fixes
test/tap/tests/reg_test_3690-admin_large_pkts-t.cpp
Add NULL-check after mysql_store_result() to fail-fast on store_result failure.

Sequence Diagram(s)

sequenceDiagram
    participant Thread as Logging Thread
    participant Logger as Global Logger
    participant LogCtx as LogBufferThreadContext
    participant LogBuf as LogBuffer
    participant File as std::fstream

    Thread->>Logger: get_log_thread_context()
    Logger->>LogCtx: return per-thread context
    Thread->>LogBuf: append(record)
    LogBuf->>LogBuf: buffer in-memory
    Thread->>LogBuf: check size/timeout/should_log()
    alt flush/rotate condition met
        LogBuf->>Logger: request flush_and_rotate()
        Logger->>Logger: acquire global locks
        Logger->>File: flush_to_file(LogBuf)
        File-->>Logger: write bytes
        Logger->>Logger: rotate() if size > max
        Logger->>LogBuf: reset() & set_last_flush_time()
    end
Loading
sequenceDiagram
    participant OldThread as Old Flow: Thread
    participant OldFile as Old Flow: std::fstream
    participant NewThread as New Flow: Thread
    participant NewBuf as New Flow: LogBuffer
    participant NewFile as New Flow: std::fstream

    rect rgba(200,100,100,0.5)
        OldThread->>OldFile: write(record)
        OldFile->>OldFile: immediate disk I/O
    end

    rect rgba(100,200,100,0.5)
        NewThread->>NewBuf: write(record)
        NewBuf->>NewBuf: buffer in memory
        NewBuf->>NewFile: flush_to_file() (on threshold/timeout)
        NewFile->>NewFile: batched disk I/O
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Poem

🐰
I stash each line inside my thread-bound nest,
Hop, nibble, sample — only log the best,
When timeout ticks or buffers swell and sigh,
I flush, rotate, and send the bytes to sky,
A jittery rabbit keeping records blessed.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: hardening query logging buffering with a focus on concurrency safety and flush/rotation behavior, referencing the related issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch v3.0-5243

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @renecannao, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and performance of query logging by introducing a buffered, thread-safe logging mechanism. It addresses potential concurrency issues, ensures correct log file rotation and flushing, and provides more granular control over logging behavior through new configuration parameters. The changes aim to make the logging system more reliable and efficient under heavy load.

Highlights

  • Concurrency and Thread Safety: Introduced per-thread logging contexts (LogBufferThreadContext) with dedicated buffers and mutexes to ensure thread-safe logging operations and prevent race conditions during log writes and rotations. Replaced localtime with thread-safe localtime_r for time conversions in log entries.
  • Log Buffer Management: Implemented a LogBuffer class to accumulate log data in memory before flushing to disk, improving write performance. Log entries are now written to these buffers first, then flushed to files.
  • Improved Log File Handling and Rotation: Added current_log_size and std::atomic<bool> logfile_open flags to track log file status and size, preventing stale file pointer usage. The flush mechanism was refactored to support both forced global draining of all thread buffers and automatic flushing based on configurable timeouts and buffer sizes.
  • Configurable Flush Behavior and Rate Limiting: New thread variables (eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, auditlog_flush_size) were added to control when and how frequently buffered logs are flushed, and to enable event sampling.
  • Code Structure and Modularity: Extracted common logging utility functions and classes into a new log_utils.h and log_utils.cpp to promote code reuse and maintainability across MySQL and PostgreSQL logging modules.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • include/MySQL_Logger.hpp
    • Added forward declarations for LogBuffer and LogBufferThreadContext.
    • Updated MySQL_Event::write methods to accept LogBuffer* instead of std::fstream*.
    • Introduced current_log_size and std::atomic<bool> logfile_open to events and audit structs.
    • Added log_thread_contexts map and mutex, along with accessor methods for thread-local log contexts and logfile open status.
    • Modified flush method signature to accept a force parameter.
  • include/MySQL_Thread.h
    • Added new integer variables for eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, and auditlog_flush_size.
  • include/PgSQL_Logger.hpp
    • Included <atomic> header.
    • Added forward declarations for LogBuffer and LogBufferThreadContext.
    • Updated PgSQL_Event::write methods to accept LogBuffer* instead of std::fstream*.
    • Introduced current_log_size and std::atomic<bool> logfile_open to events and audit structs.
    • Added log_thread_contexts map and mutex, along with accessor methods for thread-local log contexts and logfile open status.
    • Modified flush method signature to accept a force parameter.
  • include/PgSQL_Thread.h
    • Added new integer variables for eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, and auditlog_flush_size.
  • include/log_utils.h
    • Added new file defining LogBuffer class for buffered string accumulation, LogBufferThreadContext for per-thread logging state, and helper functions GetLogBufferThreadContext and flush_and_rotate.
  • include/proxysql_structs.h
    • Declared new __thread integer variables for MySQL and PostgreSQL events and audit log flush timeouts, flush sizes, and event rate limits.
  • lib/Makefile
    • Added log_utils.oo to the list of C++ object files to be compiled.
  • lib/MySQL_Logger.cpp
    • Included log_utils.h.
    • Updated MySQL_Event::write, write_auth, write_query_format_1, and write_query_format_2_json to use LogBuffer* and localtime_r for thread safety.
    • Initialized events.current_log_size and audit.current_log_size in the constructor.
    • Implemented destructor logic to flush all per-thread buffers.
    • Implemented is_events_logfile_open, set_events_logfile_open, is_audit_logfile_open, and set_audit_logfile_open methods.
    • Updated events_close_log_unlocked and audit_close_log_unlocked to set logfile_open flag to false.
    • Updated events_open_log_unlocked and audit_open_log_unlocked to initialize current_log_size, set logfile_open flag to true, and use LogBuffer for metadata writes.
    • Modified log_request and log_audit_entry to retrieve thread-local log contexts, apply rate limiting, and use flush_and_rotate for buffered writes.
    • Refactored flush method to handle forced global flushes and timed per-thread flushes.
    • Implemented get_log_thread_context to retrieve or create thread-local log contexts.
  • lib/MySQL_Thread.cpp
    • Added new events and audit log flush/rate limit variables to mysql_thread_variables_names.
    • Initialized new log flush/rate limit variables in the constructor.
    • Added logic to set_variable to handle the new log flush/rate limit variables.
    • Included new log flush/rate limit variables in get_variables_list.
    • Refreshed new log flush/rate limit variables in refresh_variables.
  • lib/PgSQL_Logger.cpp
    • Included log_utils.h and <vector>.
    • Updated PgSQL_Event::write, write_auth, write_query_format_1, and write_query_format_2_json to use LogBuffer* and localtime_r.
    • Initialized events.current_log_size and audit.current_log_size in the constructor.
    • Implemented destructor logic to flush all per-thread buffers.
    • Implemented is_events_logfile_open, set_events_logfile_open, is_audit_logfile_open, and set_audit_logfile_open methods.
    • Updated events_close_log_unlocked and audit_close_log_unlocked to set logfile_open flag to false.
    • Updated events_open_log_unlocked and audit_open_log_unlocked to initialize current_log_size and set logfile_open flag to true.
    • Modified log_request and log_audit_entry to retrieve thread-local log contexts, apply rate limiting, and use flush_and_rotate for buffered writes.
    • Refactored flush method to handle forced global flushes and timed per-thread flushes.
    • Implemented get_log_thread_context to retrieve or create thread-local log contexts.
  • lib/PgSQL_Thread.cpp
    • Added new events and audit log flush/rate limit variables to pgsql_thread_variables_names.
    • Initialized new log flush/rate limit variables in the constructor.
    • Added logic to set_variable to handle the new log flush/rate limit variables.
    • Included new log flush/rate limit variables in get_variables_list.
    • Refreshed new log flush/rate limit variables in refresh_variables.
  • lib/log_utils.cpp
    • Added new file implementing the LogBuffer class, LogBufferThreadContext class, and the flush_and_rotate and GetLogBufferThreadContext helper functions.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant improvements to the query logging mechanism, focusing on concurrency safety and correctness. The changes, including the introduction of per-thread logging buffers, atomic flags for log file state, and thread-safe time functions (localtime_r), effectively harden the logging system against race conditions and data loss. The refactoring to use a LogBuffer class and helper utilities for flushing and rotation is a clean design.

I've found one critical race condition in the creation of per-thread logging contexts, which could lead to memory corruption. I've provided a suggestion to resolve this.

Overall, this is a very good and important set of changes that significantly improves the robustness of the logging subsystem.

Comment on lines +124 to +146
LogBufferThreadContext* GetLogBufferThreadContext(std::unordered_map<pthread_t, std::unique_ptr<LogBufferThreadContext>>& log_thread_contexts, std::mutex& log_thread_contexts_lock, uint64_t current_time) {
pthread_t tid = pthread_self();
{
std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
auto it = log_thread_contexts.find(tid);
if (it != log_thread_contexts.end()) {
return it->second.get();
}
}

// Context doesn't exist for this thread, create it with proper initialization
auto new_context = std::make_unique<LogBufferThreadContext>();
LogBufferThreadContext* ptr = new_context.get();
// init() is already called in the constructor, which initializes both events and audit buffers
ptr->events.set_last_flush_time(current_time);
ptr->audit.set_last_flush_time(current_time);

{
std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
log_thread_contexts[tid] = std::move(new_context);
}
return ptr;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This function contains a race condition. If two threads call it for the first time simultaneously, both can pass the initial check for an existing context (lines 126-132). Both threads will then proceed to create a new context. The second thread to acquire the lock for insertion (line 142) will overwrite the context created by the first thread. This will cause the unique_ptr from the first thread to be destructed, deleting the object it was managing and leading to a use-after-free if the first thread has already returned the pointer.

To fix this, the lock should be held for the entire duration of checking for and creating the new context. Since this operation only happens once per thread, the lock contention will be minimal.

LogBufferThreadContext* GetLogBufferThreadContext(std::unordered_map<pthread_t, std::unique_ptr<LogBufferThreadContext>>& log_thread_contexts, std::mutex& log_thread_contexts_lock, uint64_t current_time) {
	pthread_t tid = pthread_self();
	std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
	auto it = log_thread_contexts.find(tid);
	if (it != log_thread_contexts.end()) {
		return it->second.get();
	}
	
	// Context doesn't exist for this thread, create it with proper initialization
	auto new_context = std::make_unique<LogBufferThreadContext>();
	LogBufferThreadContext* ptr = new_context.get();
	// init() is already called in the constructor, which initializes both events and audit buffers
	ptr->events.set_last_flush_time(current_time);
	ptr->audit.set_last_flush_time(current_time);
	
	log_thread_contexts[tid] = std::move(new_context);
	return ptr;
}

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (3)
include/MySQL_Thread.h (1)

553-559: Consider adding unit comments for the new *_flush_timeout, *_flush_size, and *_rate_limit fields.

Mirrors the same suggestion for PgSQL_Thread.h: existing fields like monitor_ping_interval carry //! Unit: 'ms'. annotations. Adding equivalent comments here prevents incorrect values being set through the admin interface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/MySQL_Thread.h` around lines 553 - 559, Add unit annotations for the
new fields so admin UI won't accept wrong units: annotate
eventslog_flush_timeout and auditlog_flush_timeout with "//! Unit: 'ms'";
annotate eventslog_flush_size, auditlog_flush_size, and auditlog_filesize with
"//! Unit: 'bytes'"; and annotate eventslog_rate_limit with the appropriate rate
unit used elsewhere (e.g., "//! Unit: 'events/s'"). Place these comments
immediately after the declarations (same style as monitor_ping_interval) for
eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit,
auditlog_filename, auditlog_filesize, and auditlog_flush_timeout to match
PgSQL_Thread.h conventions.
include/PgSQL_Thread.h (1)

1031-1037: Consider adding unit comments for *_flush_timeout and *_flush_size.

Several existing neighbouring fields carry inline unit annotations (e.g., //! Monitor ping interval. Unit: 'ms'.). Adding similar comments here (e.g., "Unit: ms" for eventslog_flush_timeout, "Unit: bytes" for eventslog_flush_size, "Unit: queries/sec" for eventslog_rate_limit) prevents misconfiguration, especially since these variables are surfaced through the admin interface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/PgSQL_Thread.h` around lines 1031 - 1037, Add inline unit comments
for the new configuration fields to match surrounding annotations: annotate
eventslog_flush_timeout with "Unit: ms", eventslog_flush_size with "Unit:
bytes", eventslog_rate_limit with "Unit: queries/sec" (or appropriate rate
unit), and likewise add unit comments for auditlog_flush_timeout ("Unit: ms")
and auditlog_flush_size ("Unit: bytes") next to the declarations of
eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit,
auditlog_flush_timeout, and auditlog_flush_size in PgSQL_Thread.h following the
existing comment style used by nearby fields (e.g., the //! ... Unit: 'ms'
pattern).
lib/PgSQL_Thread.cpp (1)

1801-1808: eventslog_rate_limit lacks a high-value guard, unlike all other new variables.

eventslog_flush_timeout warns above 300 s and eventslog_flush_size warns above 10 MB, but eventslog_rate_limit has no upper-bound warning. A value of, say, 1000000 would silently suppress virtually all event-log entries with no operational signal.

Consider adding a proxy_warning for unusually high sampling intervals, consistent with the other four variables:

♻️ Proposed fix
 	if (!strcasecmp(name,"eventslog_rate_limit")) {
 		int intv=atoi(value);
 		if (intv >= 1) {
 			variables.eventslog_rate_limit=intv;
+			if (intv > 10000) {
+				proxy_warning("pgsql-eventslog_rate_limit is set to a high value: %d (1 in %d events logged)\n", intv, intv);
+			}
 			return true;
 		} else {
 			return false;
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/PgSQL_Thread.cpp` around lines 1801 - 1808, The eventslog_rate_limit
handling lacks the high-value guard and should warn when an unusually large
value is set; in the block that checks strcasecmp(name,"eventslog_rate_limit")
and parses intv with atoi, add a proxy_warning call for values above the same
sensible threshold used elsewhere (e.g., >300) before assigning
variables.eventslog_rate_limit=intv, and still return true for valid ranges
while returning false for non-positive values—use the existing proxy_warning
signature as used for eventslog_flush_timeout/eventslog_flush_size so the
warning text clearly references eventslog_rate_limit and the provided intv.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@include/MySQL_Logger.hpp`:
- Around line 402-435: The header now uses std::unordered_map, std::unique_ptr
and std::mutex (e.g., members log_thread_contexts and log_thread_contexts_lock
and functions like get_log_thread_context / LogBufferThreadContext), but does
not include the corresponding standard headers; add explicit includes for
<unordered_map>, <memory> and <mutex> at the top of the file to remove reliance
on transitive includes and follow the project's include pattern.

In `@include/PgSQL_Logger.hpp`:
- Around line 102-109: The header added an unnecessary <unordered_map> include;
remove that include and instead add only <memory> and <mutex> so symbols used in
this file are satisfied: keep the member declaration log_thread_contexts (which
continues to rely on unordered_map from cpp.h), ensure std::unique_ptr is
available for LogBufferThreadContext pointers and std::mutex is available for
log_thread_contexts_lock, and verify get_log_thread_context(),
is_events_logfile_open(), set_events_logfile_open(), is_audit_logfile_open(),
and set_audit_logfile_open() compile with those includes.

---

Nitpick comments:
In `@include/MySQL_Thread.h`:
- Around line 553-559: Add unit annotations for the new fields so admin UI won't
accept wrong units: annotate eventslog_flush_timeout and auditlog_flush_timeout
with "//! Unit: 'ms'"; annotate eventslog_flush_size, auditlog_flush_size, and
auditlog_filesize with "//! Unit: 'bytes'"; and annotate eventslog_rate_limit
with the appropriate rate unit used elsewhere (e.g., "//! Unit: 'events/s'").
Place these comments immediately after the declarations (same style as
monitor_ping_interval) for eventslog_flush_timeout, eventslog_flush_size,
eventslog_rate_limit, auditlog_filename, auditlog_filesize, and
auditlog_flush_timeout to match PgSQL_Thread.h conventions.

In `@include/PgSQL_Thread.h`:
- Around line 1031-1037: Add inline unit comments for the new configuration
fields to match surrounding annotations: annotate eventslog_flush_timeout with
"Unit: ms", eventslog_flush_size with "Unit: bytes", eventslog_rate_limit with
"Unit: queries/sec" (or appropriate rate unit), and likewise add unit comments
for auditlog_flush_timeout ("Unit: ms") and auditlog_flush_size ("Unit: bytes")
next to the declarations of eventslog_flush_timeout, eventslog_flush_size,
eventslog_rate_limit, auditlog_flush_timeout, and auditlog_flush_size in
PgSQL_Thread.h following the existing comment style used by nearby fields (e.g.,
the //! ... Unit: 'ms' pattern).

In `@lib/PgSQL_Thread.cpp`:
- Around line 1801-1808: The eventslog_rate_limit handling lacks the high-value
guard and should warn when an unusually large value is set; in the block that
checks strcasecmp(name,"eventslog_rate_limit") and parses intv with atoi, add a
proxy_warning call for values above the same sensible threshold used elsewhere
(e.g., >300) before assigning variables.eventslog_rate_limit=intv, and still
return true for valid ranges while returning false for non-positive values—use
the existing proxy_warning signature as used for
eventslog_flush_timeout/eventslog_flush_size so the warning text clearly
references eventslog_rate_limit and the provided intv.

Comment on lines +102 to +109
std::unordered_map<pthread_t, std::unique_ptr<LogBufferThreadContext>> log_thread_contexts;
std::mutex log_thread_contexts_lock;

LogBufferThreadContext* get_log_thread_context();
bool is_events_logfile_open() const;
void set_events_logfile_open(bool open);
bool is_audit_logfile_open() const;
void set_audit_logfile_open(bool open);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check whether proxysql.h/cpp.h already include the needed headers
echo "=== Checking proxysql.h and cpp.h for transitive includes ==="
for f in $(fd 'proxysql\.h|cpp\.h' --type f); do
  echo "== $f =="
  rg -n "#include <(unordered_map|memory|mutex)>" "$f" || echo "  (not found)"
done

Repository: sysown/proxysql

Length of output: 207


🏁 Script executed:

# Check current state of PgSQL_Logger.hpp
echo "=== Current includes in PgSQL_Logger.hpp ==="
head -40 include/PgSQL_Logger.hpp | cat -n

Repository: sysown/proxysql

Length of output: 1072


🏁 Script executed:

# Check what cpp.h includes (focused on memory and mutex)
echo "=== Checking cpp.h for memory and mutex includes ==="
rg -n "#include <(memory|mutex)>" include/cpp.h

Repository: sysown/proxysql

Length of output: 111


🏁 Script executed:

# Also check if proxysql.h includes cpp.h or other headers that might provide these
echo "=== Checking what proxysql.h includes ==="
head -50 include/proxysql.h | grep "#include"

Repository: sysown/proxysql

Length of output: 869


Remove <unordered_map> from the proposed includes; only <memory> and <mutex> are needed.

The header already depends on std::unordered_map via cpp.h (line 73), so adding an explicit include for it is redundant. However, std::unique_ptr and std::mutex are not provided by either proxysql.h or cpp.h, so those two includes should be added.

♻️ Corrected include additions
 `#include` "proxysql.h"
 `#include` "cpp.h"
 `#include` <atomic>
+#include <memory>
+#include <mutex>
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/PgSQL_Logger.hpp` around lines 102 - 109, The header added an
unnecessary <unordered_map> include; remove that include and instead add only
<memory> and <mutex> so symbols used in this file are satisfied: keep the member
declaration log_thread_contexts (which continues to rely on unordered_map from
cpp.h), ensure std::unique_ptr is available for LogBufferThreadContext pointers
and std::mutex is available for log_thread_contexts_lock, and verify
get_log_thread_context(), is_events_logfile_open(), set_events_logfile_open(),
is_audit_logfile_open(), and set_audit_logfile_open() compile with those
includes.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (4)
lib/log_utils.cpp (3)

73-78: flush_to_file silently swallows write failures.

After logfile->write(...) the stream's error state is never inspected. A disk-full or I/O error would put the fstream into a failed state and all subsequent writes would silently no-op, quietly dropping log entries. Consider returning a boolean or checking logfile->good() post-write so callers can react.

♻️ Proposed change
-void LogBuffer::flush_to_file(std::fstream* logfile) {
+bool LogBuffer::flush_to_file(std::fstream* logfile) {
 	if (!logfile || buffer.empty()) {
-		return;
+		return false;
 	}
 	logfile->write(buffer.data(), buffer.size());
+	return logfile->good();
 }

Update the return type in include/log_utils.h accordingly and have flush_and_rotate propagate the failure (e.g., skip the size update or set a flag).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/log_utils.cpp` around lines 73 - 78, LogBuffer::flush_to_file currently
ignores write errors; change its signature (and header include/log_utils.h) to
return a bool indicating success, perform logfile->write(buffer.data(),
buffer.size()) then check logfile->good() (or !logfile->fail()) and return false
on failure, and update callers such as flush_and_rotate to propagate the failure
(e.g., do not update current size, set an error flag, or abort rotation) so disk
I/O errors are handled instead of silently dropped.

38-46: write overloads are exact duplicates of append — consider removing.

LogBuffer::write(const std::string&) and LogBuffer::write(const char*, size_t) have identical bodies to their append counterparts. Keeping both is fine for API compatibility (mimicking std::ostream), but if no caller relies on the write spelling, removing them reduces surface area.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/log_utils.cpp` around lines 38 - 46, The two LogBuffer::write overloads
are exact duplicates of LogBuffer::append; either remove the write(const
std::string&) and write(const char*, size_t) overloads if no callers rely on
them, or replace their bodies to delegate to the existing append methods (e.g.
have write(...) call append(...)) to avoid duplicated logic; update/remove any
tests/usages accordingly and add a brief comment indicating the
deprecation/delegation of write if you keep it for API compatibility.

114-116: pthread_self() truncated to 32-bit unsigned — loses half the thread-ID entropy on x86-64.

On Linux/x86-64 pthread_t is unsigned long (8 bytes), but static_cast<unsigned> produces a 32-bit value. The high 32 bits of the TID are discarded, reducing the uniqueness of the seed. While the random_device entropy source mitigates this, prefer uintptr_t or a uint64_t split to preserve all bits.

♻️ Proposed fix
-	std::seed_seq seed{
-		rd(),
-		static_cast<unsigned>(std::chrono::high_resolution_clock::now().time_since_epoch().count()),
-		static_cast<unsigned>(pthread_self())
-	};
+	auto tid = static_cast<uintptr_t>(pthread_self());
+	auto ts  = static_cast<uint64_t>(
+		std::chrono::high_resolution_clock::now().time_since_epoch().count());
+	std::seed_seq seed{
+		rd(),
+		static_cast<uint32_t>(ts),
+		static_cast<uint32_t>(ts >> 32),
+		static_cast<uint32_t>(tid),
+		static_cast<uint32_t>(tid >> 32)
+	};
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/log_utils.cpp` around lines 114 - 116, The code currently casts
pthread_self() to 32-bit unsigned, truncating the 64-bit pthread_t on x86-64;
replace the truncating static_cast<unsigned>(pthread_self()) with a
preserved-width conversion using uintptr_t/uint64_t: obtain the thread id as
reinterpret_cast<uintptr_t>(pthread_self()) (or static_cast<uint64_t>), then
either insert that full 64-bit value into your seed or split it into two 32-bit
halves (low/high) so no entropy is lost; update the initializer where
pthread_self() is used so the seed uses the uintptr_t/uint64_t representation
instead of static_cast<unsigned>.
include/MySQL_Logger.hpp (1)

405-437: Per-thread context infrastructure is well-structured.

log_thread_contexts, log_thread_contexts_lock, and the four accessor methods mirror the PgSQL_Logger pattern exactly. Forward declarations on lines 17–18 satisfy the incomplete-type requirements here.

Minor nit: lines 406–437 use a leading-space-then-tab indentation style ( \t) while the surrounding private block uses tab-only. Both header files share this inconsistency; a whitespace pass would clean it up.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@include/MySQL_Logger.hpp` around lines 405 - 437, Fix the inconsistent
indentation in the private block containing log_thread_contexts,
log_thread_contexts_lock and the four accessor declarations
(get_log_thread_context, is_events_logfile_open, set_events_logfile_open,
is_audit_logfile_open, set_audit_logfile_open) by removing the leading space
before tabs so the lines use tab-only indentation to match the surrounding
private block and the corresponding header; run a whitespace pass to ensure both
headers share the same tab-only style.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/log_utils.cpp`:
- Around line 133-136: The inline comment referencing a non-existent init()
method is stale: update or remove it in the LogBufferThreadContext creation
code; specifically, delete or replace the comment on the LogBufferThreadContext
allocation that says "// init() is already called in the constructor..." and
instead note that the constructor seeds the RNG and that LogBuffer members
events and audit are default-constructed (or simply remove the comment entirely)
so the code around new_context, events.set_last_flush_time and
audit.set_last_flush_time accurately reflects the class behavior.
- Around line 92-105: Document the invariant that buffer.reset() must be
executed while the caller holds LogBufferThreadContext::buffer_lock for the
entire duration of flush_and_rotate: add a clear comment in the flush_and_rotate
implementation (near the unlock_fn() / buffer.reset() lines) stating that
callers performing cross-thread drains (e.g., force-drain paths) are required to
hold LogBufferThreadContext::buffer_lock for the full lifetime of
flush_and_rotate so that appends or concurrent drains cannot race with
buffer.reset(); reference the symbols flush_and_rotate, buffer.reset, unlock_fn,
LogBufferThreadContext::buffer_lock, and force-drain in the comment to make the
precondition explicit.
- Around line 120-122: The function LogBufferThreadContext::should_log currently
evaluates dist(rng) * rate_limit and thus returns true unconditionally when
rate_limit ≤ 0; add an explicit guard at the start of should_log (in the
LogBufferThreadContext class) to handle non-positive rate limits (e.g., if
rate_limit <= 0 return false) and then perform the existing randomized check
using dist and rng for positive rate_limit values so that negative or zero
rate_limit do not silently behave like unrestricted logging.

---

Duplicate comments:
In `@include/PgSQL_Logger.hpp`:
- Around line 5-8: The <unordered_map> include in include/PgSQL_Logger.hpp is
redundant because std::unordered_map is already provided transitively (via
cpp.h) but it's harmless and kept intentionally to match MySQL_Logger.hpp's
pattern; no code change required—if you prefer to remove the redundancy, delete
the `#include` <unordered_map> line in PgSQL_Logger.hpp (or leave it for
explicitness) and ensure any code referencing std::unordered_map still compiles
(e.g., code in classes or functions declared in PgSQL_Logger.hpp).

---

Nitpick comments:
In `@include/MySQL_Logger.hpp`:
- Around line 405-437: Fix the inconsistent indentation in the private block
containing log_thread_contexts, log_thread_contexts_lock and the four accessor
declarations (get_log_thread_context, is_events_logfile_open,
set_events_logfile_open, is_audit_logfile_open, set_audit_logfile_open) by
removing the leading space before tabs so the lines use tab-only indentation to
match the surrounding private block and the corresponding header; run a
whitespace pass to ensure both headers share the same tab-only style.

In `@lib/log_utils.cpp`:
- Around line 73-78: LogBuffer::flush_to_file currently ignores write errors;
change its signature (and header include/log_utils.h) to return a bool
indicating success, perform logfile->write(buffer.data(), buffer.size()) then
check logfile->good() (or !logfile->fail()) and return false on failure, and
update callers such as flush_and_rotate to propagate the failure (e.g., do not
update current size, set an error flag, or abort rotation) so disk I/O errors
are handled instead of silently dropped.
- Around line 38-46: The two LogBuffer::write overloads are exact duplicates of
LogBuffer::append; either remove the write(const std::string&) and write(const
char*, size_t) overloads if no callers rely on them, or replace their bodies to
delegate to the existing append methods (e.g. have write(...) call append(...))
to avoid duplicated logic; update/remove any tests/usages accordingly and add a
brief comment indicating the deprecation/delegation of write if you keep it for
API compatibility.
- Around line 114-116: The code currently casts pthread_self() to 32-bit
unsigned, truncating the 64-bit pthread_t on x86-64; replace the truncating
static_cast<unsigned>(pthread_self()) with a preserved-width conversion using
uintptr_t/uint64_t: obtain the thread id as
reinterpret_cast<uintptr_t>(pthread_self()) (or static_cast<uint64_t>), then
either insert that full 64-bit value into your seed or split it into two 32-bit
halves (low/high) so no entropy is lost; update the initializer where
pthread_self() is used so the seed uses the uintptr_t/uint64_t representation
instead of static_cast<unsigned>.

Comment on lines +92 to +105
if (logfile) {
buffer.flush_to_file(logfile);
current_log_size += buffer.size();
flushed = true;
logfile->flush();
if (current_log_size > max_log_file_size && rotate_fn) {
rotate_fn();
current_log_size = 0;
}
}
unlock_fn();
if (flushed) {
buffer.reset(reset_time);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

buffer.reset() is called outside the file-lock boundary — undocumented but required precondition.

unlock_fn() (line 102) releases the file I/O lock, but buffer.reset() (line 104) runs afterwards, still modifying the buffer. Between those two points:

  • A concurrent force-drain from another thread could append stale data or double-flush if it doesn't hold LogBufferThreadContext::buffer_lock for the entire duration.
  • Even in the normal path, any new event appended by the owning thread before reset() completes would be silently discarded.

The PR description notes that cross-thread drains are protected by buffer_lock, so the design intent is sound, but the function carries an undocumented, unenforced invariant: callers performing a cross-thread drain must hold LogBufferThreadContext::buffer_lock for the full lifetime of the flush_and_rotate call (not just up to lock_fn). Add a comment making this explicit.

📝 Suggested documentation addition
+/**
+ * Flushes `@p` buffer to `@p` logfile under `@p` lock_fn / `@p` unlock_fn and
+ * optionally rotates.
+ *
+ * `@pre`  When called from a thread other than the buffer's owner, the caller
+ *       MUST hold LogBufferThreadContext::buffer_lock for the *entire* call,
+ *       including the buffer.reset() that happens after unlock_fn().  Failing
+ *       to do so creates a data race between the reset and any concurrent
+ *       append by the owning thread.
+ */
 bool flush_and_rotate(
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/log_utils.cpp` around lines 92 - 105, Document the invariant that
buffer.reset() must be executed while the caller holds
LogBufferThreadContext::buffer_lock for the entire duration of flush_and_rotate:
add a clear comment in the flush_and_rotate implementation (near the unlock_fn()
/ buffer.reset() lines) stating that callers performing cross-thread drains
(e.g., force-drain paths) are required to hold
LogBufferThreadContext::buffer_lock for the full lifetime of flush_and_rotate so
that appends or concurrent drains cannot race with buffer.reset(); reference the
symbols flush_and_rotate, buffer.reset, unlock_fn,
LogBufferThreadContext::buffer_lock, and force-drain in the comment to make the
precondition explicit.

Comment on lines +120 to +122
bool LogBufferThreadContext::should_log(int rate_limit) {
return dist(rng) * static_cast<double>(rate_limit) <= 1.0;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

should_log returns true unconditionally for rate_limit ≤ 0.

With rate_limit = 0: dist * 0.0 = 0.0 ≤ 1.0 → always logs.
With rate_limit < 0: dist * negative ≤ 0.0 ≤ 1.0 → always logs.

Both cases silently behave identically to rate_limit = 1, which is likely unintentional for negative values. A quick guard makes the semantics explicit:

🛡️ Proposed fix
 bool LogBufferThreadContext::should_log(int rate_limit) {
+	if (rate_limit <= 1) return true;
 	return dist(rng) * static_cast<double>(rate_limit) <= 1.0;
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/log_utils.cpp` around lines 120 - 122, The function
LogBufferThreadContext::should_log currently evaluates dist(rng) * rate_limit
and thus returns true unconditionally when rate_limit ≤ 0; add an explicit guard
at the start of should_log (in the LogBufferThreadContext class) to handle
non-positive rate limits (e.g., if rate_limit <= 0 return false) and then
perform the existing randomized check using dist and rng for positive rate_limit
values so that negative or zero rate_limit do not silently behave like
unrestricted logging.

Comment on lines +133 to +136
auto new_context = std::make_unique<LogBufferThreadContext>();
// init() is already called in the constructor, which initializes both events and audit buffers
new_context->events.set_last_flush_time(current_time);
new_context->audit.set_last_flush_time(current_time);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Stale comment references a non-existent init() method.

Line 134 says // init() is already called in the constructor, which initializes both events and audit buffers, but LogBufferThreadContext has no init() method. The constructor seeds the RNG; the LogBuffer members are default-constructed. Remove or correct the comment.

📝 Suggested fix
-	// init() is already called in the constructor, which initializes both events and audit buffers
-	new_context->events.set_last_flush_time(current_time);
-	new_context->audit.set_last_flush_time(current_time);
+	// Anchor both buffers' flush timestamps to the current time so the first
+	// flush interval starts from context creation, not from the epoch.
+	new_context->events.set_last_flush_time(current_time);
+	new_context->audit.set_last_flush_time(current_time);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
auto new_context = std::make_unique<LogBufferThreadContext>();
// init() is already called in the constructor, which initializes both events and audit buffers
new_context->events.set_last_flush_time(current_time);
new_context->audit.set_last_flush_time(current_time);
auto new_context = std::make_unique<LogBufferThreadContext>();
// Anchor both buffers' flush timestamps to the current time so the first
// flush interval starts from context creation, not from the epoch.
new_context->events.set_last_flush_time(current_time);
new_context->audit.set_last_flush_time(current_time);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/log_utils.cpp` around lines 133 - 136, The inline comment referencing a
non-existent init() method is stale: update or remove it in the
LogBufferThreadContext creation code; specifically, delete or replace the
comment on the LogBufferThreadContext allocation that says "// init() is already
called in the constructor..." and instead note that the constructor seeds the
RNG and that LogBuffer members events and audit are default-constructed (or
simply remove the comment entirely) so the code around new_context,
events.set_last_flush_time and audit.set_last_flush_time accurately reflects the
class behavior.

events.logfile->close();
delete events.logfile;
events.logfile=NULL;
set_events_logfile_open(false);
Copy link
Contributor

@mevishalr mevishalr Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@renecannao, if the logfile_open flag is set to false here, some of the logs get dropped. This flag is checked at the beginning of log_request function. So during file rotation, the logfile_open will be set to false and log_request will return immediately, thus failing to log the request into the buffers.

I tested this PR with sysbench and I get:

SQL statistics:
    queries:                             16236420 (270431.69 per sec.)
...
# cat /var/lib/proxysql/queries.log.* | wc -l
16231741

If this flag is not updated, all the logs get logged to the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mevishalr . Thank you for the review (and thank you again for the original PR !).
I am adding some extra automation to be able to detect this issue during automated testing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds great! Not seeing any drops now. :)

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/MySQL_Logger.cpp (1)

1344-1380: ⚠️ Potential issue | 🟡 Minor

current_log_size metadata accounting is silently zeroed after rotation.

events_open_log_unlocked() writes the metadata record and correctly tracks its byte count at line 1369:

events.current_log_size += metadata_buf.size();  // e.g., 87 bytes

However, flush_and_rotate in lib/log_utils.cpp unconditionally resets the counter immediately after calling rotate_fn():

rotate_fn();          // calls events_open_log_unlocked() → current_log_size = metadata_size
current_log_size = 0; // ← overwrites, metadata bytes silently lost

This means after every rotation, current_log_size starts at 0 instead of metadata_size, so the actual file is slightly larger than max_log_file_size before the next rotation fires. For a 100MB default this is negligible, but the accounting is incorrect.

The cleanest fix is to remove the current_log_size = 0 from flush_and_rotate in lib/log_utils.cpp since rotate_fn (via events_open_log_unlocked) already handles the reset. Alternatively, remove line 1369 here since the value is unconditionally overwritten by flush_and_rotate anyway:

🛠️ Proposed fix (option A — remove dead accounting in MySQL_Logger.cpp)
 		metadata_buf.flush_to_file(events.logfile);
-		events.current_log_size += metadata_buf.size();
🛠️ Proposed fix (option B — fix the root cause in lib/log_utils.cpp)
 	if (current_log_size > max_log_file_size && rotate_fn) {
 		rotate_fn();
-		current_log_size = 0;
 	}

Option B is the more correct fix: the rotate_fn callback (events_open_log_unlocked) already resets current_log_size to zero internally, so the explicit reset in flush_and_rotate is redundant and clobbers any post-open size tracking the callback performs.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/MySQL_Logger.cpp` around lines 1344 - 1380, The rotation logic resets
current_log_size after the rotate callback clobbering metadata accounting;
remove the unconditional reset in flush_and_rotate (the line that sets
current_log_size = 0) so that rotate_fn()/events_open_log_unlocked() can
initialize events.current_log_size (and add metadata bytes) correctly;
specifically edit flush_and_rotate in lib/log_utils.cpp to stop zeroing
current_log_size after calling rotate_fn(), ensuring events_open_log_unlocked()
and its use of events.current_log_size / metadata_buf.size() are preserved.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@lib/MySQL_Logger.cpp`:
- Around line 1344-1380: The rotation logic resets current_log_size after the
rotate callback clobbering metadata accounting; remove the unconditional reset
in flush_and_rotate (the line that sets current_log_size = 0) so that
rotate_fn()/events_open_log_unlocked() can initialize events.current_log_size
(and add metadata bytes) correctly; specifically edit flush_and_rotate in
lib/log_utils.cpp to stop zeroing current_log_size after calling rotate_fn(),
ensuring events_open_log_unlocked() and its use of events.current_log_size /
metadata_buf.size() are preserved.

- Introduce proxysql_query_logger_logged_queries_total with protocol label

  (mysql, pgsql) for direct query-log accounting via Prometheus.

- Wire MySQL and PgSQL loggers to increment/export the counter and hook

  PgSQL logger metrics into admin scrape updates.

- Add reg_test_5389-flush_logs_no_drop-t.cpp to stress concurrent

  client traffic + PROXYSQL FLUSH LOGS and assert no dropped logs via

  metric delta == executed queries.

- Document the new TAP with doxygen comments for env knobs and flow.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@include/PgSQL_Logger.hpp`:
- Around line 5-11: Keep the include and forward-declaration layout consistent
with MySQL_Logger.hpp: retain the explicit includes <atomic>, <memory>, <mutex>,
and <unordered_map> (even if <unordered_map> can be pulled in transitively) and
keep forward declarations for LogBuffer and LogBufferThreadContext so symbol
resolution matches the MySQL counterpart; update include order or spacing only
to match MySQL_Logger.hpp formatting if needed, but do not remove the
<unordered_map> include or the two forward-declare lines (LogBuffer,
LogBufferThreadContext).

Register the new FLUSH LOGS no-drop test in the default-g4 test group.
Add NULL check for mysql_store_result() return value before calling
match_row_lens(). Exit with diagnostic message instead of crashing
when the resultset is unavailable.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/tap/groups/groups.json`:
- Line 252: The entry for "reg_test_5389-flush_logs_no_drop-t" currently lists a
single variant ("default-g4") but must match the standard g4 matrix used by
other g4 groups; update the array for the "reg_test_5389-flush_logs_no_drop-t"
key in groups.json to include the full five variants (i.e., add the four missing
MySQL session-variable variants used by other g4 entries so the array matches
the other g4-group entries).

In `@test/tap/tests/reg_test_3690-admin_large_pkts-t.cpp`:
- Around line 198-201: The test calls plan(ROW_NUM) before the select_res NULL
check, so aborting must use TAP's BAIL_OUT() instead of returning EXIT_FAILURE;
replace the early return in the select_res == NULL branch with a BAIL_OUT(...)
call that includes a descriptive message and the mysql_error(proxysql_admin)
string (similar to other post-plan aborts), ensuring the diag(...) remains if
desired and that the BAIL_OUT invocation provides the same error context.

"pgsql-parameterized_kill_queries_test-t": [ "default-g4", "mysql-auto_increment_delay_multiplex=0-g4", "mysql-multiplexing=false-g4", "mysql-query_digests=0-g4", "mysql-query_digests_keep_comment=1-g4" ],
"pgsql-reg_test_5284_frontend_ssl_enforcement-t": [ "default-g4", "mysql-auto_increment_delay_multiplex=0-g4", "mysql-multiplexing=false-g4", "mysql-query_digests=0-g4", "mysql-query_digests_keep_comment=1-g4" ],
"pgsql-reg_test_5273_bind_parameter_format-t": [ "default-g4", "mysql-auto_increment_delay_multiplex=0-g4", "mysql-multiplexing=false-g4", "mysql-query_digests=0-g4", "mysql-query_digests_keep_comment=1-g4" ],
"reg_test_5389-flush_logs_no_drop-t": [ "default-g4" ],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Step 1: Verify the file exists and examine the structure around line 252
echo "=== Checking file existence and line 252 ==="
if [ -f "test/tap/groups/groups.json" ]; then
  echo "File found. Reading lines 250-254:"
  sed -n '250,254p' test/tap/groups/groups.json
else
  echo "File not found"
fi

echo -e "\n=== Getting total line count ==="
wc -l test/tap/groups/groups.json

Repository: sysown/proxysql

Length of output: 961


🏁 Script executed:

#!/bin/bash
# Step 2: Find all g4 entries and count their variant patterns
echo "=== Analyzing all g4 entries ==="
# Extract all lines containing "g4" and their variant counts
python3 << 'EOF'
import json
import re

try:
    with open('test/tap/groups/groups.json', 'r') as f:
        data = json.load(f)
    
    # Count variants for each g4 entry
    g4_variant_counts = {}
    for key, value in data.items():
        if isinstance(value, list):
            g4_entries = [v for v in value if 'g4' in v]
            if g4_entries:
                g4_variant_counts[key] = len(g4_entries)
    
    # Group by count
    count_groups = {}
    for key, count in g4_variant_counts.items():
        if count not in count_groups:
            count_groups[count] = []
        count_groups[count].append(key)
    
    # Display results
    for count in sorted(count_groups.keys()):
        entries = count_groups[count]
        print(f"\nEntries with {count} g4 variant(s): ({len(entries)} total)")
        if count < 5:
            # Show single-variant entries
            for entry in entries:
                print(f"  - {entry}")
except Exception as e:
    print(f"Error: {e}")
EOF

Repository: sysown/proxysql

Length of output: 213


🏁 Script executed:

#!/bin/bash
# Step 3: Extract and display the specific entry at line 252 with context
echo "=== Full context of the entry (10 lines before and after) ==="
sed -n '242,262p' test/tap/groups/groups.json | cat -n

Repository: sysown/proxysql

Length of output: 2947


Add missing variant configurations to match the standard g4 test matrix.

This is the only single-variant g4 entry in the file. All 67 other g4-group entries include the full 5-variant array. Either add the four missing MySQL session variable variants to this entry or clarify why flush-log behavior doesn't require testing across those configurations.

🔧 Proposed fix to align with the g4 standard matrix
-  "reg_test_5389-flush_logs_no_drop-t": [ "default-g4" ],
+  "reg_test_5389-flush_logs_no_drop-t": [ "default-g4", "mysql-auto_increment_delay_multiplex=0-g4", "mysql-multiplexing=false-g4", "mysql-query_digests=0-g4", "mysql-query_digests_keep_comment=1-g4" ],
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"reg_test_5389-flush_logs_no_drop-t": [ "default-g4" ],
"reg_test_5389-flush_logs_no_drop-t": [ "default-g4", "mysql-auto_increment_delay_multiplex=0-g4", "mysql-multiplexing=false-g4", "mysql-query_digests=0-g4", "mysql-query_digests_keep_comment=1-g4" ],
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/groups.json` at line 252, The entry for
"reg_test_5389-flush_logs_no_drop-t" currently lists a single variant
("default-g4") but must match the standard g4 matrix used by other g4 groups;
update the array for the "reg_test_5389-flush_logs_no_drop-t" key in groups.json
to include the full five variants (i.e., add the four missing MySQL
session-variable variants used by other g4 entries so the array matches the
other g4-group entries).

Comment on lines +198 to +201
if (select_res == NULL) {
diag("Failed to store result: %s", mysql_error(proxysql_admin));
return EXIT_FAILURE;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for bail_out definition in TAP framework
find test/tap -name "*.h" -o -name "*.c" | xargs grep -n "bail_out" 2>/dev/null | head -20

Repository: sysown/proxysql

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Also search for tap.h location and structure
find test/tap -name "tap.h" -type f | head -5

Repository: sysown/proxysql

Length of output: 77


🏁 Script executed:

#!/bin/bash
# Check if bail_out is used elsewhere in test files
rg -n "bail_out" test/tap/tests/ --type c --type cpp | head -10

Repository: sysown/proxysql

Length of output: 41


🏁 Script executed:

#!/bin/bash
# Examine the tap.h file to understand available functions
cat test/tap/tap/tap.h | head -100

Repository: sysown/proxysql

Length of output: 2548


🏁 Script executed:

#!/bin/bash
# Search for common TAP functions and error handling patterns
rg -n "^(int|void|char)" test/tap/tap/tap.h | head -20

Repository: sysown/proxysql

Length of output: 476


🏁 Script executed:

#!/bin/bash
# Check how other tests handle early exits and failures
rg -B2 -A2 "return EXIT_FAILURE" test/tap/tests/ --type c --type cpp | head -40

Repository: sysown/proxysql

Length of output: 2231


🏁 Script executed:

#!/bin/bash
# Search for BAIL_OUT usage in the entire test suite
rg -n "BAIL_OUT" test/tap/tests/ --type c --type cpp

Repository: sysown/proxysql

Length of output: 5490


🏁 Script executed:

#!/bin/bash
# Examine the actual test file to see the plan() call and surrounding context
cat -n test/tap/tests/reg_test_3690-admin_large_pkts-t.cpp | sed -n '170,210p'

Repository: sysown/proxysql

Length of output: 1558


🏁 Script executed:

#!/bin/bash
# Check the TAP framework implementation of plan() to understand plan/count enforcement
cat test/tap/tap/tap.h | sed -n '100,160p'

Repository: sysown/proxysql

Length of output: 1654


Use BAIL_OUT() instead of return EXIT_FAILURE after plan() has been called.

plan(ROW_NUM) is emitted at line 175 before this guard is reached. Returning EXIT_FAILURE here skips all 10 planned ok() assertions, causing a TAP plan/count mismatch. All other early-exit paths (lines 156, 165, 171) occur before plan() and do not have this issue. BAIL_OUT() is the correct TAP protocol signal used throughout the codebase for aborting after plan() has been declared.

🛠️ Proposed fix
 	if (select_res == NULL) {
-		diag("Failed to store result: %s", mysql_error(proxysql_admin));
-		return EXIT_FAILURE;
+		BAIL_OUT("Failed to store result: %s", mysql_error(proxysql_admin));
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (select_res == NULL) {
diag("Failed to store result: %s", mysql_error(proxysql_admin));
return EXIT_FAILURE;
}
if (select_res == NULL) {
BAIL_OUT("Failed to store result: %s", mysql_error(proxysql_admin));
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/reg_test_3690-admin_large_pkts-t.cpp` around lines 198 - 201,
The test calls plan(ROW_NUM) before the select_res NULL check, so aborting must
use TAP's BAIL_OUT() instead of returning EXIT_FAILURE; replace the early return
in the select_res == NULL branch with a BAIL_OUT(...) call that includes a
descriptive message and the mysql_error(proxysql_admin) string (similar to other
post-plan aborts), ensuring the diag(...) remains if desired and that the
BAIL_OUT invocation provides the same error context.

…5243

# Conflicts:
#	include/PgSQL_Logger.hpp
#	lib/Makefile
#	lib/PgSQL_Logger.cpp
#	lib/ProxySQL_Admin.cpp
#	test/tap/groups/groups.json
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

@renecannao renecannao merged commit b64f2e5 into v3.0 Feb 20, 2026
12 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants