Skip to content

Optimize query logging performance (#5243)#5364

Merged
renecannao merged 3 commits intosysown:v3.0-5243from
mevishalr:feature/query-logging-improvements
Feb 18, 2026
Merged

Optimize query logging performance (#5243)#5364
renecannao merged 3 commits intosysown:v3.0-5243from
mevishalr:feature/query-logging-improvements

Conversation

@mevishalr
Copy link
Contributor

@mevishalr mevishalr commented Feb 11, 2026

1. Description

This PR improves the query logging performance of ProxySQL by using buffered logging and reducing lock contentions. These changes improve the logging performance by several orders of magnitude.

2. Link to tracking issue

Fixes #5243

3. Detailed Changes

New Features

Log Sampling

  • Added log sampling for MySQL & PgSQL event logs.
  • Control sampling rate using mysql-eventslog_rate_limit & pgsql-eventslog_rate_limit.

Logging Performance Improvements

  • Introduced LogBuffer and LogBufferThreadContext to implement per-thread buffering, reducing lock contention while logging.
  • Replaced localtime() with localtime_r() to improve JSON logging performance during timestamp generation.
  • Added global configuration variables:
    • mysql-eventslog_flush_timeout: Timeout in milliseconds to flush data stored in log buffers to log file.
    • mysql-eventslog_flush_size: Buffer size threshold in bytes for flushing.
    • The same variable equivalents for PgSQL has been added.

Bug Fixes

  • Fixed logs getting dropped during file rotation by checking file open status atomically.

4. Performance Benefit

Sysbench OLTP R/W benchmarks were conducted to compare ProxySQL’s performance before and after the logging improvements. With default ProxySQL v3.0.5 and JSON logging enabled, throughput was limited to approximately 24K QPS. After applying the proposed changes and enhancements, ProxySQL now achieves ~295K QPS logging throughput while also noticing a considerable drop in client-side latencies.

4.1. Benchmarking Logging Subsystem Performance

The benchmarks shown below purely focus on performance of logging subsystem of ProxySQL. The benchmarks were done by connecting ProxySQL to a simple golang server that speaks MySQL protocol. The server returns a response immediately and does not execute any queries. Hence the sysbench results show 0 for Reads and Writes.

4.1.1. Sysbench R/W Test

Global Variables:

  • mysql-threads = 32
  • mysql-eventslog_format = 2
  • mysql-eventslog_flush_timeout = 5000 (5s) [variable added in this PR]
  • mysql-eventslog_flush_size = 16 * 1024 (16KiB) [variable added in this PR]

Sysbench Command

sysbench --time=60 --threads=256 --rate=0 --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-user=test --mysql-db=sysbench --report-interval=1 --db-ps-mode=disable oltp_read_write run

Current ProxySQL

SQL statistics:
    queries performed:
        read:                            0
        write:                           0
        other:                           1435820
        total:                           1435820
    transactions:                        71791  (1194.58 per sec.)
    queries:                             1435820 (23891.56 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

Throughput:
    events/s (eps):                      1194.5782
    time elapsed:                        60.0974s
    total number of events:              71791

Latency (ms):
         min:                                   18.10
         avg:                                  214.17
         max:                                 1399.43
         95th percentile:                     1013.60
         sum:                             15375506.10

Threads fairness:
    events (avg/stddev):           280.4336/430.99
    execution time (avg/stddev):   60.0606/0.03

Number of logs written to log file:

$ cat /var/lib/proxysql/queries.log.* | wc -l
1435815

ProxySQL with the logging improvements

SQL statistics:
    queries performed:
        read:                            0
        write:                           0
        other:                           17701600
        total:                           17701600
    transactions:                        885080 (14742.30 per sec.)
    queries:                             17701600 (294845.97 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

Throughput:
    events/s (eps):                      14742.2986
    time elapsed:                        60.0368s
    total number of events:              885080

Latency (ms):
         min:                                    1.60
         avg:                                   17.35
         max:                                  204.52
         95th percentile:                       33.72
         sum:                             15360419.16

Threads fairness:
    events (avg/stddev):           3457.3438/217.22
    execution time (avg/stddev):   60.0016/0.00

Number of logs written to file:

$ cat /var/lib/proxysql/queries.log.* | wc -l
17701600

4.1.2. Sysbench R/W test with different log sizes

Note: Log sizes reported in tables are approximate values. Actual log sizes maybe slightly higher or lower.

Current ProxySQL

Log Size QPS Achieved Avg Latency (ms) p95 Latency (ms) Queries Logged Queries in File
1024 22.15K 230.90 1,013.60 1,331,800 1,331,799
2048 17.74K 288.25 1,479.41 1,067,200 1,067,196
4096 14.86K 172.16 861.95 892,920 879,123
8192 11.14K 458.82 2,362.72 671,100 671,089

ProxySQL with logging improvements

Log Size QPS Achieved Avg Latency (ms) p95 Latency (ms) Queries Logged Queries in File
1024 160.60K 31.85 314.45 9,654,060 9,654,060
2048 108.26K 47.24 601.29 6,512,280 6,512,280
4096 64.42K 79.36 601.29 3,878,040 3,878,040
8192 34.65K 147.43 977.74 2,090,120 2,090,120

4.1.3. Sysbench R/W test with different Flush sizes

Log Size: 4096 (approx)

Flush Size QPS Achieved Avg Latency (ms) p95 Latency (ms) Queries Logged Queries in File
16 KiB 64.42K 79.36 601.29 3,878,040 3,878,040
32 KiB 71.21K 71.80 539.71 4,286,860 4,286,860
64 KiB 76.22K 67.08 211.60 4,588,040 4,588,040
128 KiB 84.63K 30.23 80.03 5,080,980 5,080,980

4.2 Benchmarking with a real MySQL Server

Global Variables

  • mysql-threads = 32
  • mysql-eventslog_format = 2
  • mysql-eventslog_flush_timeout = 5000 (5s) [variable added in this PR]
  • mysql-eventslog_flush_size = 16 * 1024 (16KiB) [variable added in this PR]

Sysbench Command

sysbench --time=60 --threads=256 --rate=0 --mysql-host=127.0.0.1 --mysql-port=6033 --mysql-user=test --mysql-db=sysbench --report-interval=1 --db-ps-mode=disable oltp_read_write run

Current ProxySQL

SQL statistics:
    queries performed:
        read:                            1111180
        write:                           317480
        other:                           158740
        total:                           1587400
    transactions:                        79370  (1320.41 per sec.)
    queries:                             1587400 (26408.25 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

Throughput:
    events/s (eps):                      1320.4124
    time elapsed:                        60.1100s
    total number of events:              79370

Latency (ms):
         min:                                   49.41
         avg:                                  193.75
         max:                                  505.76
         95th percentile:                      356.70
         sum:                             15377636.08

Threads fairness:
    events (avg/stddev):           310.0391/136.20
    execution time (avg/stddev):   60.0689/0.03

Number of logs written to file:

# cat /var/lib/proxysql/queries.log.* | wc -l
1587397

Proxysql with logging imporvements

SQL statistics:
    queries performed:
        read:                            10736124
        write:                           3067464
        other:                           1533732
        total:                           15337320
    transactions:                        766866 (12772.92 per sec.)
    queries:                             15337320 (255458.39 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

Throughput:
    events/s (eps):                      12772.9196
    time elapsed:                        60.0384s
    total number of events:              766866

Latency (ms):
         min:                                    5.24
         avg:                                   20.03
         max:                                11573.57
         95th percentile:                       44.98
         sum:                             15360902.91

Threads fairness:
    events (avg/stddev):           2995.5703/1084.12
    execution time (avg/stddev):   60.0035/0.01

Number of logs written to file:

# cat /var/lib/proxysql/queries.log.* | wc -l
15337320

5. Related PRs

Summary by CodeRabbit

  • New Features

    • Buffered per-thread event and audit logging with probabilistic rate control
    • New runtime settings: flush timeout, flush size, and event rate limit for MySQL/PostgreSQL
  • Refactor

    • Replaced direct file I/O with an in-memory buffering workflow and thread-aware contexts
    • Centralized flush/rotate logic and improved timestamp handling
  • Reliability

    • Buffers are flushed before rotation/shutdown and before explicit log-file operations to reduce data loss

Detailed Changes:
- Introduced LogBuffer and LogBufferThreadContext to implement per-thread buffering, reducing lock contention.
- Replaced localtime() with localtime_r() for improved performance and thread safety during timestamp generation.
- Implemented configurable sampling for event logs to reduce I/O overhead.
- Added global configuration variables to control flush size, flush interval, and sampling rate for event and audit logs.
@coderabbitai
Copy link

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

Introduces a LogBuffer abstraction and per-thread LogBufferThreadContext for buffered, sampled logging; replaces direct std::fstream I/O with LogBuffer* in MySQL/PostgreSQL event writers; adds flush/rotate utilities, per-thread runtime config, and thread-safe logfile state tracking.

Changes

Cohort / File(s) Summary
Logging Abstraction Foundation
include/log_utils.h, lib/log_utils.cpp
Adds LogBuffer (in-memory buffer API), LogBufferThreadContext (per-thread events/audit buffers + RNG sampling), GetLogBufferThreadContext, and flush_and_rotate utilities.
MySQL Logging API & Implementation
include/MySQL_Logger.hpp, lib/MySQL_Logger.cpp
Switches MySQL_Event write APIs from std::fstream*LogBuffer*; integrates per-thread context map + mutex; adds current_log_size and logfile_open flags; updates writes to use LogBuffer and per-thread flush/rotate logic.
PostgreSQL Logging API & Implementation
include/PgSQL_Logger.hpp, lib/PgSQL_Logger.cpp
Mirrors MySQL changes: forward-declares LogBuffer/LogBufferThreadContext, updates PgSQL_Event write signatures to LogBuffer*, adds per-thread context map/mutex, current_log_size, and logfile_open atomics; integrates per-thread buffering and flush/rotate.
Thread Configuration Variables (headers)
include/MySQL_Thread.h, include/PgSQL_Thread.h, include/proxysql_structs.h
Adds per-thread runtime variables: eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, auditlog_flush_size and corresponding __thread externs.
Thread Configuration Variables (impls)
lib/MySQL_Thread.cpp, lib/PgSQL_Thread.cpp
Initializes defaults, exposes new variables in variable lists, adds validation and setters/getters, and refreshes them at runtime.
Build & Admin Integration
lib/Makefile, lib/ProxySQL_Admin.cpp
Adds log_utils to build objects; ensures logger flush() is invoked before admin flush_log() to flush buffers prior to file operations.

Sequence Diagram(s)

sequenceDiagram
    participant Worker as WorkerThread
    participant Logger as MySQL_Logger / PgSQL_Logger
    participant CTX as LogBufferThreadContext
    participant Buffer as LogBuffer
    participant File as LogFile

    Worker->>Logger: log_request(event)
    Logger->>CTX: get_log_thread_context()
    CTX-->>Logger: thread_context (events/audit buffers)
    Logger->>Buffer: append formatted message
    alt buffer size/time threshold reached
        Logger->>Logger: flush_and_rotate(buffer)
        Logger->>File: write buffer contents -> flush/rotate
        File-->>Logger: flushed / rotated
    end
    Logger-->>Worker: return
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I nibble bytes into a fluffy heap,
Per-thread burrows where the log-strings sleep,
RNG hops choose which tales to keep,
Timed flushes wake the file, a gentle beep,
Hooray—less blocking, more joyful leap.

🚥 Pre-merge checks | ✅ 4 | ❌ 2
❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 10.96% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Merge Conflict Detection ⚠️ Warning ❌ Merge conflicts detected (16 files):

⚔️ include/MySQL_Logger.hpp (content)
⚔️ include/MySQL_Thread.h (content)
⚔️ include/PgSQL_Logger.hpp (content)
⚔️ include/PgSQL_Thread.h (content)
⚔️ include/proxysql_structs.h (content)
⚔️ lib/Admin_Handler.cpp (content)
⚔️ lib/Makefile (content)
⚔️ lib/MySQL_HostGroups_Manager.cpp (content)
⚔️ lib/MySQL_Logger.cpp (content)
⚔️ lib/MySQL_Thread.cpp (content)
⚔️ lib/PgSQL_HostGroups_Manager.cpp (content)
⚔️ lib/PgSQL_Logger.cpp (content)
⚔️ lib/PgSQL_Thread.cpp (content)
⚔️ lib/ProxySQL_Admin.cpp (content)
⚔️ lib/Query_Cache.cpp (content)
⚔️ test/tap/groups/groups.json (content)

These conflicts must be resolved before merging into v3.0.
Resolve conflicts locally and push changes to this branch.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'Optimize query logging performance' accurately summarizes the main objective of the changeset, which implements buffered, per-thread logging with sampling and improved timestamp handling to enhance logging throughput and reduce contention.
Linked Issues check ✅ Passed The PR implementation comprehensively addresses all coding requirements from issue #5243: introduces buffered per-thread logging via LogBuffer and LogBufferThreadContext [all files], implements log sampling with rate_limit parameters [MySQL_Thread.cpp, PgSQL_Thread.cpp, log_utils.cpp], replaces localtime() with localtime_r() [MySQL_Logger.cpp, PgSQL_Logger.cpp], adds atomic file-open checks [MySQL_Logger.hpp, PgSQL_Logger.hpp], and provides configurable flush behavior via timeout/size variables [all files].
Out of Scope Changes check ✅ Passed All changes are directly scoped to logging optimization objectives: new LogBuffer abstraction and per-thread context management, configuration variables for flush behavior, timestamp improvements, and rate-limiting mechanisms. No unrelated refactoring or feature additions detected in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
⚔️ Resolve merge conflicts (beta)
  • Auto-commit resolved conflicts to branch feature/query-logging-improvements
  • Post resolved changes as copyable diffs in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @mevishalr, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request dramatically enhances the query logging performance of ProxySQL by introducing a buffered, per-thread logging mechanism. These changes aim to alleviate bottlenecks caused by frequent disk I/O and lock contention, leading to a substantial increase in queries per second that can be logged. Additionally, it provides more control over logging behavior through new configuration options and improves the robustness of log file handling.

Highlights

  • Logging Performance Improvements: Implemented per-thread buffered logging using new LogBuffer and LogBufferThreadContext classes, significantly reducing lock contention and improving query logging throughput by several orders of magnitude (from ~24K QPS to ~295K QPS).
  • Log Sampling: Added log sampling capabilities for MySQL and PgSQL event logs, allowing control over the sampling rate via new configuration variables (mysql-eventslog_rate_limit and pgsql-eventslog_rate_limit).
  • Thread-Safe Timestamp Generation: Replaced the non-thread-safe localtime() function with localtime_r() for generating timestamps in JSON logs, enhancing performance and reliability in multi-threaded environments.
  • New Configuration Variables: Introduced global configuration variables (mysql-eventslog_flush_timeout, mysql-eventslog_flush_size, pgsql-eventslog_flush_timeout, pgsql-eventslog_flush_size) to control the flushing behavior of log buffers based on timeout and size thresholds.
  • Robust Log File Rotation: Fixed an issue where logs could be dropped during file rotation by implementing atomic checks for log file open status, ensuring data integrity.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • include/MySQL_Logger.hpp
    • Added forward declarations for LogBuffer and LogBufferThreadContext.
    • Modified MySQL_Event methods to accept LogBuffer* instead of std::fstream* for writing.
    • Introduced current_log_size and logfile_open (atomic boolean) members to track log file state.
    • Added log_thread_contexts map and log_thread_contexts_lock mutex for managing per-thread log buffers.
    • Declared new methods get_log_thread_context, is_events_logfile_open, set_events_logfile_open, is_audit_logfile_open, and set_audit_logfile_open.
  • include/MySQL_Thread.h
    • Added eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, and auditlog_flush_size to the MySQL_Threads_Handler::variables struct.
  • include/PgSQL_Logger.hpp
    • Added forward declarations for LogBuffer and LogBufferThreadContext.
    • Modified PgSQL_Event methods to accept LogBuffer* instead of std::fstream* for writing.
    • Introduced current_log_size and logfile_open (atomic boolean) members to track log file state for both events and audit logs.
    • Added log_thread_contexts map and log_thread_contexts_lock mutex for managing per-thread log buffers.
    • Declared new methods get_log_thread_context, is_events_logfile_open, set_events_logfile_open, is_audit_logfile_open, and set_audit_logfile_open.
  • include/PgSQL_Thread.h
    • Added eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, and auditlog_flush_size to the PgSQL_Threads_Handler::variables struct.
  • include/log_utils.h
    • Added new header file defining LogBuffer class for in-memory log buffering.
    • Added LogBufferThreadContext class to manage per-thread log buffers and a random number generator for sampling.
    • Added helper functions GetLogBufferThreadContext to retrieve or create thread-local contexts and flush_and_rotate for managing buffer flushing and file rotation.
  • include/proxysql_structs.h
    • Added new thread-local integer variables for PostgreSQL and MySQL event and audit log flush timeouts, flush sizes, and event log rate limits.
  • lib/Makefile
    • Added log_utils.oo to the _OBJ_CXX list for compilation.
  • lib/MySQL_Logger.cpp
    • Included log_utils.h.
    • Updated MySQL_Event::write, write_query_format_1, write_query_format_2_json, and write_auth methods to operate on LogBuffer* instead of std::fstream*.
    • Replaced localtime() with localtime_r() for thread-safe timestamp generation in JSON logging.
    • Replaced std::endl with ' ' for potentially faster output.
    • Initialized events.current_log_size and audit.current_log_size to 0 in the constructor.
    • Implemented flushing of all per-thread log buffers in the MySQL_Logger destructor.
    • Implemented is_events_logfile_open, set_events_logfile_open, is_audit_logfile_open, and set_audit_logfile_open methods.
    • Modified events_open_log_unlocked and audit_open_log_unlocked to initialize current_log_size and set logfile_open atomically.
    • Updated log_request and log_audit_entry to retrieve thread-local LogBufferThreadContext, apply log sampling, and use flush_and_rotate for buffered writing.
    • Modified flush() method to iterate through thread-local contexts and flush their buffers based on configured timeouts.
    • Implemented get_log_thread_context to retrieve or create the current thread's logging context.
  • lib/MySQL_Thread.cpp
    • Added new logging-related variables to mysql_thread_variables_names array.
    • Initialized default values for eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, and auditlog_flush_size in the MySQL_Threads_Handler constructor.
    • Modified set_variable to handle the new logging configuration variables.
    • Added a call to GloMyLogger->flush() before GloMyLogger->flush_log() when changing eventslog_format to ensure all buffered logs are written.
  • lib/PgSQL_Logger.cpp
    • Included log_utils.h.
    • Updated PgSQL_Event::write, write_query_format_1, write_query_format_2_json, and write_auth methods to operate on LogBuffer* instead of std::fstream*.
    • Replaced localtime() with localtime_r() for thread-safe timestamp generation in JSON logging.
    • Replaced std::endl with ' ' for potentially faster output.
    • Initialized events.current_log_size and audit.current_log_size to 0 in the constructor.
    • Implemented flushing of all per-thread log buffers in the PgSQL_Logger destructor.
    • Implemented is_events_logfile_open, set_events_logfile_open, is_audit_logfile_open, and set_audit_logfile_open methods.
    • Modified events_open_log_unlocked and audit_open_log_unlocked to initialize current_log_size and set logfile_open atomically.
    • Updated log_request and log_audit_entry to retrieve thread-local LogBufferThreadContext, apply log sampling, and use flush_and_rotate for buffered writing.
    • Modified flush() method to iterate through thread-local contexts and flush their buffers based on configured timeouts.
    • Implemented get_log_thread_context to retrieve or create the current thread's logging context.
  • lib/PgSQL_Thread.cpp
    • Added new logging-related variables to pgsql_thread_variables_names array.
    • Initialized default values for eventslog_flush_timeout, eventslog_flush_size, eventslog_rate_limit, auditlog_flush_timeout, and auditlog_flush_size in the PgSQL_Threads_Handler constructor.
    • Modified set_variable to handle the new logging configuration variables.
    • Added a call to GloPgSQL_Logger->flush() before GloPgSQL_Logger->flush_log() when changing eventslog_format to ensure all buffered logs are written.
  • lib/ProxySQL_Admin.cpp
    • Added calls to GloMyLogger->flush() and GloPgSQL_Logger->flush() before their respective flush_log() calls in ProxySQL_Admin::flush_logs() to ensure all buffered logs are written to disk.
  • lib/log_utils.cpp
    • Added new source file implementing the LogBuffer and LogBufferThreadContext classes.
    • Implemented LogBuffer methods for appending and writing data, resetting, and flushing to a file stream.
    • Implemented LogBufferThreadContext constructor with robust random number generator seeding and should_log method for sampling.
    • Implemented GetLogBufferThreadContext to manage thread-local LogBufferThreadContext instances.
    • Implemented flush_and_rotate helper function for thread-safe buffered log flushing and file rotation.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces significant performance improvements to query logging by implementing per-thread buffered logging, which is an excellent approach to reduce lock contention. The new configuration variables for controlling flushing behavior and the addition of log sampling provide valuable flexibility. The replacement of localtime() with the thread-safe localtime_r() is also a commendable improvement.

I have identified a critical race condition in the creation of thread-local log contexts that could lead to incorrect behavior. Additionally, there's a high-severity bug where a log buffer isn't cleared after use, potentially corrupting log files. I've also included some medium-severity suggestions to enhance code safety by replacing sprintf with snprintf.

Comment on lines +143 to +145
std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
log_thread_contexts[tid] = std::move(new_context);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a race condition here. If two threads call GetLogBufferThreadContext for the first time, both can pass the initial check, create a new_context, and then attempt to insert it into the map. The second thread to acquire the lock will overwrite the context created by the first, leading to incorrect behavior. To fix this, you should re-check if the context exists after acquiring the lock for the second time, before inserting the new one. This is a classic double-checked locking pattern.

		std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
		auto it = log_thread_contexts.find(tid);
		if (it != log_thread_contexts.end()) {
			// Another thread created it in the meantime.
			// The `new_context` we created will be safely destroyed when it goes out of scope.
			return it->second.get();
		}
		log_thread_contexts[tid] = std::move(new_context);

Comment on lines +1356 to +1357
log_ctx->events.flush_to_file(events.logfile);
events.current_log_size += log_ctx->events.size();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The log buffer log_ctx->events is not reset after writing the metadata event and flushing it to the file. This will cause the metadata to remain in the buffer and be prepended to the next log entry written by this thread, leading to duplicated data and incorrect log files. The buffer should be reset after it's flushed.

 			log_ctx->events.flush_to_file(events.logfile);
 			events.current_log_size += log_ctx->events.size();
 			log_ctx->events.reset(monotonic_time());

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);
if (localtime_r(&timer, &tm_info)) {
strftime(buffer1, 32, "%Y-%m-%d %H:%M:%S", &tm_info);
sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

sprintf is used here for string formatting. This function is unsafe as it doesn't perform bounds checking and can lead to buffer overflows. It's better to use snprintf for safer string formatting.

This issue is also present on lines 634, 1087, and 1101. Please apply similar changes there.

 			snprintf(buffer2, sizeof(buffer2),"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);

sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);
if (localtime_r(&timer, &tm_info)) {
strftime(buffer1, 32, "%Y-%m-%d %H:%M:%S", &tm_info);
sprintf(buffer2,"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

sprintf is used here for string formatting. This function is unsafe as it doesn't perform bounds checking and can lead to buffer overflows. It's better to use snprintf for safer string formatting.

This issue is also present on lines 232, 425, and 439. Please apply similar changes there.

 			snprintf(buffer2, sizeof(buffer2),"%s.%03u", buffer1, (unsigned)(start_time%1000000)/1000);

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
lib/PgSQL_Logger.cpp (2)

564-580: ⚠️ Potential issue | 🟠 Major

events_close_log_unlocked / audit_close_log_unlocked don't set logfile_open to false.

When a log file is closed (e.g., during disable or rotation), logfile_open remains true. Worker threads that pass the lock-free is_events_logfile_open() / is_audit_logfile_open() check will continue writing into their per-thread buffers and attempt flushes. While flush_and_rotate guards against a NULL logfile under the lock, data accumulates silently in per-thread buffers during the window where the file is closed but logfile_open hasn't been toggled. This is particularly relevant if logging is disabled via an empty filename — events.enabled catches it, but the atomic flag is left stale.

Proposed fix
 void PgSQL_Logger::events_close_log_unlocked() {
 	if (events.logfile) {
 		events.logfile->flush();
 		events.logfile->close();
 		delete events.logfile;
 		events.logfile=NULL;
+		set_events_logfile_open(false);
 	}
 }
 
 void PgSQL_Logger::audit_close_log_unlocked() {
 	if (audit.logfile) {
 		audit.logfile->flush();
 		audit.logfile->close();
 		delete audit.logfile;
 		audit.logfile=NULL;
+		set_audit_logfile_open(false);
 	}
 }

102-134: ⚠️ Potential issue | 🟡 Minor

write_query_format_2_json always returns total_bytes = 0.

total_bytes is initialized to 0 at line 371 and never updated before the return at line 462. The caller PgSQL_Event::write (line 103) returns this value. If the return value is used for size tracking (as write_query_format_1 does), JSON-format logging will always report 0 bytes written. This appears to be pre-existing behavior carried forward, but worth noting as it could affect any metrics relying on this return value.

Also applies to: 369-463

lib/MySQL_Logger.cpp (2)

1284-1300: ⚠️ Potential issue | 🟠 Major

Same logfile_open stale-flag issue as in PgSQL_Logger.cpp.

events_close_log_unlocked() and audit_close_log_unlocked() set the logfile pointer to NULL but never call set_events_logfile_open(false) / set_audit_logfile_open(false). See the corresponding comment on PgSQL_Logger.cpp for details and proposed fix — the same change is needed here.


1338-1358: ⚠️ Potential issue | 🔴 Critical

Add buffer reset after direct flush_to_file to prevent metadata event duplication in logs

The metadata event is flushed directly to file but the buffer is not cleared. When the thread subsequently logs another event and flush_and_rotate is called, the same metadata bytes will be written to the file again. This deviates from the standard pattern used in flush_and_rotate, which calls buffer.reset() after flushing.

Fix
 			metaEvent.write(&log_ctx->events, nullptr);
 			log_ctx->events.flush_to_file(events.logfile);
 			events.current_log_size += log_ctx->events.size();
+			log_ctx->events.reset(monotonic_time());
🤖 Fix all issues with AI agents
In `@include/log_utils.h`:
- Around line 1-11: The header is not self-contained: GetLogBufferThreadContext
uses pthread_t as a map key but <pthread.h> is not included; add `#include`
<pthread.h> near the other includes in include/log_utils.h so pthread_t is
defined (ensuring the header no longer relies on transitive includes), and
confirm any declarations or typedefs referencing pthread_t (e.g., the map key in
GetLogBufferThreadContext) compile without additional includes.
- Around line 121-128: The docstring for flush_to_file is incorrect about
resetting the buffer; update the comment for flush_to_file (function
flush_to_file in log_utils.h) to state that it writes the contents of the buffer
to the provided std::fstream but does not reset or clear the buffer (the
rotation/reset is done by flush_and_rotate), so future maintainers are not
misled; keep the function behavior unchanged unless you intentionally want to
move reset logic into flush_to_file, in which case update the implementation in
log_utils.cpp (and tests) accordingly.

In `@lib/log_utils.cpp`:
- Around line 81-108: In flush_and_rotate, logfile->flush() is called after
rotate_fn() which may delete the fstream (use-after-free); move the flush so it
runs immediately after buffer.flush_to_file(logfile) and before the rotation
check (i.e., call logfile->flush() right after current_log_size +=
buffer.size()), or alternatively check logfile validity after rotate_fn() before
calling flush; update the function flush_and_rotate and any uses of
rotate_fn/events_close_log_unlocked/events_flush_log_unlocked to ensure
rotate_fn() cannot invalidate logfile before flush.

In `@lib/MySQL_Logger.cpp`:
- Around line 1212-1232: Destructor is calling flush_and_rotate(log_ctx->events,
events.logfile, events.current_log_size, events.max_log_file_size, [this]() {
wrlock(); }, [this]() { wrunlock(); }, nullptr) (and similarly for audit) with
seven arguments but the flush_and_rotate overload used elsewhere (see
PgSQL_Logger) expects a reset_time callback as an additional parameter; update
these calls in MySQL_Logger::~MySQL_Logger (where LogBufferThreadContext*
log_ctx is handled) to supply the correct reset_time callable (or the same
sentinel used in PgSQL_Logger) instead of the current nullptr, or adjust to call
the proper overload so events and audit flushes provide the reset_time callback
along with the existing wrlock/wrunlock lambdas and file/max-size args.

In `@lib/MySQL_Thread.cpp`:
- Around line 1826-1882: The setters currently call atoi() (e.g. in the blocks
that assign variables.eventslog_flush_timeout, variables.eventslog_flush_size,
variables.auditlog_flush_timeout, variables.auditlog_flush_size and
eventslog_rate_limit) which silently converts invalid strings to 0; replace
atoi() with strtol/strtoul and validate the end-pointer and range so non-numeric
input is rejected: call strtol(value, &endptr, 10), ensure endptr != value and
*endptr == '\0', check the parsed value fits the expected range (>=0 or >=1
where required) before assigning to the variables and returning true, otherwise
return false. Ensure you apply the same pattern to all flush_* and rate_limit
setters and use unsigned function (strtoul/strtoull) where sizes must be
non-negative.

In `@lib/PgSQL_Logger.cpp`:
- Around line 1026-1065: Remove the unused local booleans flush_eventslog and
flush_auditlog from PgSQL_Logger::flush; they are declared but never read, so
delete the lines that define these variables (flush_eventslog and
flush_auditlog) and keep the existing conditional flush logic for
log_ctx->events and log_ctx->audit intact (no other behavior changes to
flush_and_rotate calls or time checks).
🧹 Nitpick comments (1)
lib/log_utils.cpp (1)

121-123: should_log does not guard against rate_limit <= 0.

If rate_limit is 0, the expression dist(rng) * 0.0 <= 1.0 always evaluates to true, meaning all events are logged — which is the safe fallback. However, if rate_limit is negative (e.g., due to misconfiguration), the product can be negative, also always satisfying <= 1.0, effectively logging everything. This is a safe but perhaps unintended behavior.

Consider adding a fast-path for common cases and a guard:

Proposed improvement
 bool LogBufferThreadContext::should_log(int rate_limit) {
+	if (rate_limit <= 1) return true;
 	return dist(rng) * static_cast<double>(rate_limit) <= 1.0;
 }

Comment on lines +1212 to +1232
// Flush all per-thread buffers before destroying the logger
{
std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
for (const auto& kv : log_thread_contexts) {
LogBufferThreadContext* log_ctx = kv.second.get();
if (!log_ctx->events.empty()) {
flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size,
[this]() { wrlock(); },
[this]() { wrunlock(); },
nullptr
);
}
if (!log_ctx->audit.empty()) {
flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size,
[this]() { wrlock(); },
[this]() { wrunlock(); },
nullptr
);
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Destructor calls flush_and_rotate with 7 arguments — same concern as PgSQL_Logger.

See the corresponding comment on PgSQL_Logger.cpp. The reset_time parameter may be missing.

🤖 Prompt for AI Agents
In `@lib/MySQL_Logger.cpp` around lines 1212 - 1232, Destructor is calling
flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size,
events.max_log_file_size, [this]() { wrlock(); }, [this]() { wrunlock(); },
nullptr) (and similarly for audit) with seven arguments but the flush_and_rotate
overload used elsewhere (see PgSQL_Logger) expects a reset_time callback as an
additional parameter; update these calls in MySQL_Logger::~MySQL_Logger (where
LogBufferThreadContext* log_ctx is handled) to supply the correct reset_time
callable (or the same sentinel used in PgSQL_Logger) instead of the current
nullptr, or adjust to call the proper overload so events and audit flushes
provide the reset_time callback along with the existing wrlock/wrunlock lambdas
and file/max-size args.

Comment on lines +1826 to +1882
if (!strcasecmp(name,"eventslog_flush_timeout")) {
int intv=atoi(value);
if (intv >= 0) {
variables.eventslog_flush_timeout=intv;
if (intv > 5 * 60 * 1000) {
proxy_warning("mysql-eventslog_flush_timeout is set to a high value: %dms\n", intv);
}
return true;
} else {
return false;
}
}
if (!strcasecmp(name,"eventslog_flush_size")) {
int intv=atoi(value);
if (intv >= 0) {
variables.eventslog_flush_size=intv;
if (intv > 10 * 1024 * 1024) {
proxy_warning("mysql-eventslog_flush_size is set to a high value: %d\n", intv);
}
return true;
} else {
return false;
}
}
if (!strcasecmp(name,"eventslog_rate_limit")) {
int intv=atoi(value);
if (intv >= 1) {
variables.eventslog_rate_limit=intv;
return true;
} else {
return false;
}
}
if (!strcasecmp(name,"auditlog_flush_timeout")) {
int intv=atoi(value);
if (intv >= 0) {
variables.auditlog_flush_timeout=intv;
if (intv > 5 * 60 * 1000) {
proxy_warning("mysql-auditlog_flush_timeout is set to a high value: %dms\n", intv);
}
return true;
} else {
return false;
}
}
if (!strcasecmp(name,"auditlog_flush_size")) {
int intv=atoi(value);
if (intv >= 0) {
variables.auditlog_flush_size=intv;
if (intv > 10 * 1024 * 1024) {
proxy_warning("mysql-auditlog_flush_size is set to a high value: %d\n", intv);
}
return true;
} else {
return false;
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject non-numeric values for new flush_ settings.*

atoi() turns invalid strings into 0, and 0 is currently accepted for *_flush_timeout/*_flush_size, so typos silently disable flushing. Use strtol (with end‑pointer validation) so non-numeric input is rejected.

💡 Suggested fix (apply same pattern to other flush_* setters)
-	if (!strcasecmp(name,"eventslog_flush_timeout")) {
- 		int intv=atoi(value);
- 		if (intv >= 0) {
- 			variables.eventslog_flush_timeout=intv;
+	if (!strcasecmp(name,"eventslog_flush_timeout")) {
+ 		char *end = nullptr;
+ 		long intv = strtol(value, &end, 10);
+ 		if (end == value || *end != '\0' || intv < 0) {
+ 			return false;
+ 		}
+ 		variables.eventslog_flush_timeout=static_cast<int>(intv);
 			if (intv > 5 * 60 * 1000) {
 				proxy_warning("mysql-eventslog_flush_timeout is set to a high value: %dms\n", intv);
 			}
 			return true;
- 		} else {
- 			return false;
- 		}
 	}
🤖 Prompt for AI Agents
In `@lib/MySQL_Thread.cpp` around lines 1826 - 1882, The setters currently call
atoi() (e.g. in the blocks that assign variables.eventslog_flush_timeout,
variables.eventslog_flush_size, variables.auditlog_flush_timeout,
variables.auditlog_flush_size and eventslog_rate_limit) which silently converts
invalid strings to 0; replace atoi() with strtol/strtoul and validate the
end-pointer and range so non-numeric input is rejected: call strtol(value,
&endptr, 10), ensure endptr != value and *endptr == '\0', check the parsed value
fits the expected range (>=0 or >=1 where required) before assigning to the
variables and returning true, otherwise return false. Ensure you apply the same
pattern to all flush_* and rate_limit setters and use unsigned function
(strtoul/strtoull) where sizes must be non-negative.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@lib/MySQL_Logger.cpp`:
- Around line 1211-1232: Destructor holds log_thread_contexts_lock while calling
flush_and_rotate which can call wrlock -> deadlock; fix
MySQL_Logger::~MySQL_Logger (and mirror in PgSQL_Logger::~PgSQL_Logger) by first
acquiring log_thread_contexts_lock, iterating log_thread_contexts and moving or
copying each LogBufferThreadContext's events and audit buffers into local
temporary containers (e.g., vectors of buffers or pointers), then release
log_thread_contexts_lock and only after unlocking call flush_and_rotate on those
collected buffers (using the same events.logfile/audit.logfile and sizes) so
flush_and_rotate/wrlock executes without holding log_thread_contexts_lock.

In `@lib/PgSQL_Logger.cpp`:
- Around line 491-512: PgSQL_Logger::~PgSQL_Logger currently holds
log_thread_contexts_lock while iterating log_thread_contexts and calling
flush_and_rotate (which invokes wrlock/wrunlock), creating a lock-order
inversion; fix it by copying the relevant LogBufferThreadContext pointers/refs
(for both events and audit) into a local vector while holding
log_thread_contexts_lock, then release the lock and call flush_and_rotate for
each saved LogBufferThreadContext without holding log_thread_contexts_lock;
update the destructor code paths that reference log_thread_contexts,
log_thread_contexts_lock, LogBufferThreadContext, events, audit,
flush_and_rotate, wrlock, and wrunlock accordingly.
🧹 Nitpick comments (3)
lib/log_utils.cpp (2)

125-147: TOCTOU in GetLogBufferThreadContext is benign but the double-lock pattern is wasteful.

Since only the owning thread (identified by pthread_self()) creates its own entry, there's no real race between the lookup and insert. However, the two separate lock acquisitions could be simplified into one, avoiding the unnecessary unlock/relock overhead on first access.

Suggested simplification
 LogBufferThreadContext* GetLogBufferThreadContext(std::unordered_map<pthread_t, std::unique_ptr<LogBufferThreadContext>>& log_thread_contexts, std::mutex& log_thread_contexts_lock, uint64_t current_time) {
 	pthread_t tid = pthread_self();
-	{
-		std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
-		auto it = log_thread_contexts.find(tid);
-		if (it != log_thread_contexts.end()) {
-			return it->second.get();
-		}
-	}
-	
-	// Context doesn't exist for this thread, create it with proper initialization
-	auto new_context = std::make_unique<LogBufferThreadContext>();
-	LogBufferThreadContext* ptr = new_context.get();
-	// init() is already called in the constructor, which initializes both events and audit buffers
-	ptr->events.set_last_flush_time(current_time);
-	ptr->audit.set_last_flush_time(current_time);
-	
-	{
-		std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
-		log_thread_contexts[tid] = std::move(new_context);
-	}
-	return ptr;
+	std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
+	auto it = log_thread_contexts.find(tid);
+	if (it != log_thread_contexts.end()) {
+		return it->second.get();
+	}
+	auto new_context = std::make_unique<LogBufferThreadContext>();
+	LogBufferThreadContext* ptr = new_context.get();
+	ptr->events.set_last_flush_time(current_time);
+	ptr->audit.set_last_flush_time(current_time);
+	log_thread_contexts[tid] = std::move(new_context);
+	return ptr;
 }

This also eliminates a subtle risk: if LogBufferThreadContext() constructor throws, the current code would leave no entry in the map but would have already released the lock — a single lock scope makes the intent clearer.


110-118: pthread_self() cast to unsigned may truncate on platforms where pthread_t is a pointer.

On Linux with glibc, pthread_t is an unsigned long (8 bytes on 64-bit), while unsigned is typically 4 bytes. The truncation only affects RNG seed entropy here, so it's not a correctness issue, but using std::hash<pthread_t>{}(pthread_self()) would be more portable.

lib/MySQL_Logger.cpp (1)

1333-1358: events_open_log_unlocked writes metadata via a per-thread buffer, mixing per-thread and shared state.

Line 1338 calls get_log_thread_context() during log file opening (which runs under wrlock). This introduces the lock-ordering concern noted above. Additionally, the metadata write uses the calling thread's buffer, meaning if events_open_log_unlocked is called from a different context (e.g., admin thread via flush_log), it uses that thread's buffer—mixing per-thread buffer ownership with a shared operation.

Consider writing the metadata event directly to the logfile here (bypassing the per-thread buffer) since this is already under wrlock and the file is freshly opened.

Comment on lines 1211 to +1232
MySQL_Logger::~MySQL_Logger() {
// Flush all per-thread buffers before destroying the logger
{
std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
for (const auto& kv : log_thread_contexts) {
LogBufferThreadContext* log_ctx = kv.second.get();
if (!log_ctx->events.empty()) {
flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size,
[this]() { wrlock(); },
[this]() { wrunlock(); },
nullptr
);
}
if (!log_ctx->audit.empty()) {
flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size,
[this]() { wrlock(); },
[this]() { wrunlock(); },
nullptr
);
}
}
}
Copy link

@coderabbitai coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash

echo "=== Destructor implementation (1211-1232 and context) ==="
sed -n '1205,1240p' lib/MySQL_Logger.cpp

echo ""
echo "=== flush_and_rotate function signature ==="
rg -n "flush_and_rotate" lib/MySQL_Logger.cpp | head -5

echo ""
echo "=== events_open_log_unlocked implementation ==="
sed -n '1333,1370p' lib/MySQL_Logger.cpp

echo ""
echo "=== events_set_base_filename and wrlock context ==="
sed -n '1400,1430p' lib/MySQL_Logger.cpp

echo ""
echo "=== flush_log implementation ==="
sed -n '1250,1290p' lib/MySQL_Logger.cpp

Repository: sysown/proxysql

Length of output: 4927


🏁 Script executed:

#!/bin/bash

echo "=== get_log_thread_context implementation ==="
rg -n "get_log_thread_context()" lib/MySQL_Logger.cpp -A15 | head -50

echo ""
echo "=== events_flush_log_unlocked implementation ==="
sed -n '1290,1330p' lib/MySQL_Logger.cpp

echo ""
echo "=== audit_flush_log_unlocked implementation ==="
rg -n "audit_flush_log_unlocked" lib/MySQL_Logger.cpp -A10 | head -30

echo ""
echo "=== All lock patterns: wrlock followed by mutex operations ==="
rg -n "wrlock\(\)" lib/MySQL_Logger.cpp -A5 | head -100

Repository: sysown/proxysql

Length of output: 7313


🏁 Script executed:

#!/bin/bash

echo "=== get_log_thread_context function definition ==="
rg -n "get_log_thread_context\(\)" lib/MySQL_Logger.cpp -B2 -A20 | head -60

echo ""
echo "=== Check if get_log_thread_context acquires log_thread_contexts_lock ==="
rg -n "LogBufferThreadContext.*get_log_thread_context" lib/MySQL_Logger.h -A5

echo ""
echo "=== Search for get_log_thread_context implementation in .h ==="
rg -n "get_log_thread_context" lib/MySQL_Logger.h -A10

Repository: sysown/proxysql

Length of output: 2843


🏁 Script executed:

#!/bin/bash

echo "=== Find MySQL_Logger header file ==="
find . -name "*.h" -type f | grep -i logger | head -10

echo ""
echo "=== Search for get_log_thread_context in all header files ==="
rg -n "get_log_thread_context" --type=h -A10 | head -80

echo ""
echo "=== Search for LogBufferThreadContext and log_thread_contexts_lock in header ==="
rg -n "log_thread_contexts_lock|LogBufferThreadContext" --type=h -A3 -B1 | head -100

Repository: sysown/proxysql

Length of output: 5347


🏁 Script executed:

#!/bin/bash

echo "=== get_log_thread_context implementation in MySQL_Logger.cpp ==="
rg -n "MySQL_Logger::get_log_thread_context\(\)" lib/MySQL_Logger.cpp -A30

echo ""
echo "=== GetLogBufferThreadContext helper implementation in log_utils.cpp ==="
rg -n "GetLogBufferThreadContext" lib/ -A25 | head -80

echo ""
echo "=== Check entire destructor to see lock pattern ==="
sed -n '1212,1235p' lib/MySQL_Logger.cpp

Repository: sysown/proxysql

Length of output: 2952


🏁 Script executed:

#!/bin/bash

echo "=== Verify PgSQL_Logger has same pattern ==="
sed -n '1050,1080p' lib/PgSQL_Logger.cpp

echo ""
echo "=== Check if events_flush_log_unlocked calls events_open_log_unlocked ==="
sed -n '1303,1312p' lib/MySQL_Logger.cpp

echo ""
echo "=== Verify full flush_and_rotate signature ==="
rg -n "void flush_and_rotate" lib/ -A5 | head -30

echo ""
echo "=== Check PgSQL_Logger destructor ==="
rg -n "PgSQL_Logger::~PgSQL_Logger" lib/PgSQL_Logger.cpp -A30

Repository: sysown/proxysql

Length of output: 2419


🏁 Script executed:

#!/bin/bash

echo "=== Verify flush_and_rotate signature and implementation ==="
rg -n "template.*flush_and_rotate|void flush_and_rotate" lib/ -A2 | head -40

echo ""
echo "=== Check audit_open_log_unlocked to confirm it also calls get_log_thread_context ==="
rg -n "audit_open_log_unlocked\(\)" lib/MySQL_Logger.cpp -A30 | head -50

Repository: sysown/proxysql

Length of output: 2404


🏁 Script executed:

#!/bin/bash

echo "=== Search for flush_and_rotate in header files ==="
rg -n "flush_and_rotate" include/ -B2 -A3

echo ""
echo "=== Check if audit_open_log_unlocked calls get_log_thread_context ==="
sed -n '1371,1420p' lib/MySQL_Logger.cpp

echo ""
echo "=== Check if there's a rotate_fn call inside flush_and_rotate ==="
rg -n "rotate_fn" lib/ -B2 -A2 | head -40

Repository: sysown/proxysql

Length of output: 2754


🏁 Script executed:

#!/bin/bash

echo "=== Full flush_and_rotate signature in header ==="
sed -n '199,215p' include/log_utils.h

echo ""
echo "=== Full flush_and_rotate implementation ==="
sed -n '81,110p' lib/log_utils.cpp

echo ""
echo "=== Verify destructor code path: does it immediately call lock_fn? ==="
rg -n "flush_and_rotate" lib/log_utils.cpp -B5 -A15 | head -50

Repository: sysown/proxysql

Length of output: 1714


🏁 Script executed:

#!/bin/bash

echo "=== Verify audit_open_log_unlocked complete implementation ==="
sed -n '1371,1405p' lib/MySQL_Logger.cpp

echo ""
echo "=== Confirm PgSQL_Logger destructor has same issue ==="
sed -n '491,512p' lib/PgSQL_Logger.cpp

Repository: sysown/proxysql

Length of output: 2117


Lock-ordering inversion in destructors and related code paths can cause deadlock.

The destructor holds log_thread_contexts_lock (line 1214) and immediately calls flush_and_rotate(), which invokes wrlock() at the start (log_utils.cpp line 92). However, events_open_log_unlocked() calls get_log_thread_context() which acquires log_thread_contexts_lock, and this path is reached from code that already holds wrlock:

  • flush_log() (line 1262): wrlock()events_flush_log_unlocked()events_open_log_unlocked()get_log_thread_context()
  • events_set_base_filename() (line 1406): wrlock()events_open_log_unlocked()get_log_thread_context()

This creates a classic lock-ordering inversion:

  • Thread A (destructor): log_thread_contexts_lockwrlock
  • Thread B (flush_log/events_set_base_filename): wrlocklog_thread_contexts_lock

Restructure the destructor to collect buffers while holding log_thread_contexts_lock, then release it before calling flush_and_rotate(). Apply the same fix to PgSQL_Logger::~PgSQL_Logger().

🤖 Prompt for AI Agents
In `@lib/MySQL_Logger.cpp` around lines 1211 - 1232, Destructor holds
log_thread_contexts_lock while calling flush_and_rotate which can call wrlock ->
deadlock; fix MySQL_Logger::~MySQL_Logger (and mirror in
PgSQL_Logger::~PgSQL_Logger) by first acquiring log_thread_contexts_lock,
iterating log_thread_contexts and moving or copying each
LogBufferThreadContext's events and audit buffers into local temporary
containers (e.g., vectors of buffers or pointers), then release
log_thread_contexts_lock and only after unlocking call flush_and_rotate on those
collected buffers (using the same events.logfile/audit.logfile and sizes) so
flush_and_rotate/wrlock executes without holding log_thread_contexts_lock.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue. In the destructor, the rotate function is a nullptr, hence events_flush_log_unlocked() will not be called.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify events_open_log_unlocked calls get_log_thread_context while wrlock is held
echo "=== events_open_log_unlocked implementation showing get_log_thread_context call ==="
sed -n '1333,1370p' lib/MySQL_Logger.cpp

echo ""
echo "=== Verify flush_log holds wrlock before calling events_flush_log_unlocked ==="
sed -n '1260,1270p' lib/MySQL_Logger.cpp

echo ""
echo "=== Check events_flush_log_unlocked which calls events_open_log_unlocked ==="
sed -n '1303,1312p' lib/MySQL_Logger.cpp

echo ""
echo "=== Verify GetLogBufferThreadContext acquires log_thread_contexts_lock ==="
rg -n "GetLogBufferThreadContext" lib/log_utils.cpp -A 15

Length of output: 3219


@mevishalr Thank you for the clarification. I want to make sure I understand the threading model correctly.

The rotate_fn=nullptr does prevent events_flush_log_unlocked() from being called within the destructor's flush_and_rotate() call. However, the potential deadlock I'm concerned about is between two different threads:

Thread A (destructor):

  1. Line 1214: Acquires log_thread_contexts_lock (via std::lock_guard)
  2. Line 1217: Calls flush_and_rotate()
  3. Inside flush_and_rotate() (log_utils.cpp:92): Tries to acquire wrlock() via lock_fn()

Thread B (e.g., worker thread calling flush_log() or events_set_base_filename()):

  1. Line 1262: Acquires wrlock()
  2. Line 1263: Calls events_flush_log_unlocked()
  3. Inside events_flush_log_unlocked(): Calls events_open_log_unlocked()
  4. Line 1338: Calls get_log_thread_context()
  5. Inside GetLogBufferThreadContext() (log_utils.cpp:128): Tries to acquire log_thread_contexts_lock

This creates opposite lock ordering that can deadlock if both threads execute concurrently.

Question: Is there a guarantee in ProxySQL's architecture that the MySQL_Logger destructor only runs after all worker threads have been stopped? If so, there would be no concurrent access and no deadlock risk. Could you confirm this is the case?


🧬 Code Graph Analysis Results

[
{
"filename": "lib/log_utils.cpp",
"range": "81-89",
"content": "bool flush_and_rotate(\n\tLogBuffer& buffer,\n\tstd::fstream* logfile,\n\tunsigned int& current_log_size,\n\tunsigned int max_log_file_size,\n\tstd::function<void()> lock_fn,\n\tstd::function<void()> unlock_fn,\n\tstd::function<void()> rotate_fn,\n\tuint64_t reset_time)\n{\n\tbool flushed = false;\n\tlock_fn();\n\tif (logfile) {\n\t\tbuffer.flush_to_file(logfile);\n\t\tcurrent_log_size += buffer.size();\n\t\tflushed = true;\n\t\tlogfile->flush();\n\t\tif (current_log_size > max_log_file_size && rotate_fn) {\n\t\t\trotate_fn();\n\t\t\tcurrent_log_size = 0;\n\t\t}\n\t}\n\tunlock_fn();\n\tif (flushed) {\n\t\tbuffer.reset(reset_time);\n\t}\n\treturn flushed;\n}\n"
},
{
"filename": "lib/MySQL_Logger.cpp",
"range": "destructor block",
"content": "MySQL_Logger::~MySQL_Logger() {\n\t// Flush all per-thread buffers before destroying the logger\n \t{\n \t\tstd::lock_guardstd::mutex lock(log_thread_contexts_lock);\n \t\tfor (const auto& kv : log_thread_contexts) {\n \t\t\tLogBufferThreadContext* log_ctx = kv.second.get();\n \t\t\tif (!log_ctx->events.empty()) {\n \t\t\t\tflush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size,\n \t\t\t\t\tthis { wrlock(); },\n \t\t\t\t\tthis { wrunlock(); },\n \t\t\t\t\tnullptr\n \t\t\t\t);\n \t\t\t}\n \t\t\tif (!log_ctx->audit.empty()) {\n \t\t\t\tflush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size,\n \t\t\t\t\tthis { wrlock(); },\n \t\t\t\t\tthis { wrunlock(); },\n \t\t\t\t\tnullptr\n \t\t\t\t);\n \t\t\t}\n \t\t}\n \t}\n\tif (events.datadir) {\n\t\tfree(events.datadir);\n\t}\n\tfree(events.base_filename);\n\tif (audit.datadir) {\n\t\tfree(audit.datadir);\n\t}\n\tfree(audit.base_filename);\n\tdelete MyLogCB;\n};"
}
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the destructor runs when all worker threads join. The destructor is invoked in src/main.cpp by ProxySQL_Main_shutdown_all_modules().

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the shutdown sequence showing destructor runs after thread join
echo "=== ProxySQL_Main_shutdown_all_modules implementation ==="
rg -n "ProxySQL_Main_shutdown_all_modules" src/main.cpp -A 30 | head -80

echo ""
echo "=== Check where MySQL_Logger destructor is called ==="
rg -n "delete.*GloMyLogger|GloMyLogger.*delete" src/main.cpp -B 5 -A 5

echo ""
echo "=== Check for thread join operations before logger destruction ==="
rg -n "join\(\)|pthread_join" src/main.cpp -B 2 -A 2 | head -50

echo ""
echo "=== Check MySQL_Threads_Handler shutdown ==="
rg -n "MySQL_Threads_Handler.*shutdown|shutdown.*MySQL_Threads" src/main.cpp -B 3 -A 3 | head -40

Length of output: 3491


@mevishalr Thank you for the clarification! I've verified the shutdown sequence in src/main.cpp and can confirm:

  1. Lines 1139-1180: All worker threads are joined (MyMon_thread->join(), pthread_join(GloMyQC->purge_thread_id, NULL), etc.)
  2. Line 1710: ProxySQL_Main_shutdown_all_modules() is called after thread joins complete
  3. Line 1328: delete GloMyLogger executes inside ProxySQL_Main_shutdown_all_modules()

This confirms that by the time MySQL_Logger::~MySQL_Logger() runs, all worker threads have already been stopped and joined. Therefore, there is no possibility of concurrent access and no deadlock risk from the lock ordering in the destructor.

My concern was based on an incorrect assumption about concurrent execution during destruction. Your explanation of the shutdown sequence resolves this completely. Thank you for the patience in clarifying ProxySQL's architecture! ✅


🧬 Code Graph Analysis Results

[Relevant snippets from lib/MySQL_Logger.cpp]

  • File: lib/MySQL_Logger.cpp

    • Snippet (destructor of MySQL_Logger)
      • Location: lib/MySQL_Logger.cpp, approx lines 470-490
      • Content:
        MySQL_Logger::~MySQL_Logger() {
        // Flush all per-thread buffers before destroying the logger
        {
        std::lock_guardstd::mutex lock(log_thread_contexts_lock);
        for (const auto& kv : log_thread_contexts) {
        LogBufferThreadContext* log_ctx = kv.second.get();
        if (!log_ctx->events.empty()) {
        flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size,
        this { wrlock(); },
        this { wrunlock(); },
        nullptr
        );
        }
        if (!log_ctx->audit.empty()) {
        flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size,
        this { wrlock(); },
        this { wrunlock(); },
        nullptr
        );
        }
        }
        }
        if (events.datadir) {
        free(events.datadir);
        }
        free(events.base_filename);
        if (audit.datadir) {
        free(audit.datadir);
        }
        free(audit.base_filename);
        delete MyLogCB;
        };
  • Snippet (getter for per-thread log context)

    • Location: lib/MySQL_Logger.cpp, approx lines 1110-1125
    • Content:
      LogBufferThreadContext* MySQL_Logger::get_log_thread_context() {
      return GetLogBufferThreadContext(log_thread_contexts, log_thread_contexts_lock, monotonic_time());
      }

✏️ Learnings added
Learnt from: mevishalr
Repo: sysown/proxysql PR: 5364
File: lib/MySQL_Logger.cpp:1211-1232
Timestamp: 2026-02-13T05:55:36.022Z
Learning: In ProxySQL, the MySQL_Logger and PgSQL_Logger destructors run after all worker threads have been joined during shutdown. The sequence in src/main.cpp is: (1) join all worker threads, (2) call ProxySQL_Main_shutdown_all_modules() which deletes the loggers. Therefore, there is no concurrent thread access during logger destruction, and lock ordering in the destructors cannot cause deadlocks.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Comment on lines 491 to +512
PgSQL_Logger::~PgSQL_Logger() {
// Flush all per-thread buffers before destroying the logger
{
std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
for (const auto& kv : log_thread_contexts) {
LogBufferThreadContext* log_ctx = kv.second.get();
if (!log_ctx->events.empty()) {
flush_and_rotate(log_ctx->events, events.logfile, events.current_log_size, events.max_log_file_size,
[this]() { wrlock(); },
[this]() { wrunlock(); },
nullptr
);
}
if (!log_ctx->audit.empty()) {
flush_and_rotate(log_ctx->audit, audit.logfile, audit.current_log_size, audit.max_log_file_size,
[this]() { wrlock(); },
[this]() { wrunlock(); },
nullptr
);
}
}
}
Copy link

@coderabbitai coderabbitai bot Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Same lock-ordering inversion as MySQL_Logger destructor.

This destructor holds log_thread_contexts_lock while calling flush_and_rotatewrlock(), creating the same deadlock risk described in the MySQL_Logger review. Apply the same fix here.

🤖 Prompt for AI Agents
In `@lib/PgSQL_Logger.cpp` around lines 491 - 512, PgSQL_Logger::~PgSQL_Logger
currently holds log_thread_contexts_lock while iterating log_thread_contexts and
calling flush_and_rotate (which invokes wrlock/wrunlock), creating a lock-order
inversion; fix it by copying the relevant LogBufferThreadContext pointers/refs
(for both events and audit) into a local vector while holding
log_thread_contexts_lock, then release the lock and call flush_and_rotate for
each saved LogBufferThreadContext without holding log_thread_contexts_lock;
update the destructor code paths that reference log_thread_contexts,
log_thread_contexts_lock, LogBufferThreadContext, events, audit,
flush_and_rotate, wrlock, and wrunlock accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an issue. In the destructor, the rotate function is a nullptr, hence events_flush_log_unlocked() will not be called.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mevishalr, understood. You're right that nullptr for the rotate function means no additional lock acquisition during rotation. If the destructor is only called during controlled shutdown when no other threads are accessing the logger, the lock ordering pattern wouldn't cause issues in practice.

Thank you for the clarification!

- Added <pthread.h> to log_utils.h and removed the header from log_utils.cpp
- Pass 0 to reset_time argument in flush_and_rotate inside logger's destructor
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
1 Security Hotspot

See analysis details on SonarQube Cloud

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
lib/MySQL_Logger.cpp (1)

1335-1361: ⚠️ Potential issue | 🟡 Minor

Buffer containing old entries written to new log file during rotation.

When the calling thread triggers log rotation through flush_and_rotate(), the sequence is: (1) flush thread's buffer to old file, (2) call rotate_fn() to open new file, (3) reset buffer. During step (2), events_open_log_unlocked() gets the calling thread's buffer context and writes a metadata event to it. Since the buffer was flushed but not reset, any leftover data from the previous request is written along with the metadata to the new file.

This only affects the thread triggering rotation—other threads' buffers are unaffected. Consider resetting the buffer immediately after flushing (before rotation) rather than after the unlock to prevent old entries from appearing in the new log file.

🧹 Nitpick comments (7)
lib/log_utils.cpp (3)

80-107: current_log_size accounting uses buffer size before reset — verify no double-count.

The flow is: flush_to_file (writes), current_log_size += buffer.size() (accounts), then later buffer.reset() (clears). This is correct — the buffer isn't cleared until after the lock is released, so size() still reflects what was written. Good sequencing.

One minor note: current_log_size is unsigned int while buffer.size() returns size_t. On 64-bit systems with very large buffers, this could theoretically overflow. In practice, flush thresholds prevent this, but a static_cast or type alignment would make the intent clearer.


124-145: TOCTOU window in GetLogBufferThreadContext — benign but could be tightened.

The function checks the map under a lock, releases it, constructs a new context, then re-acquires the lock to insert. Between the two critical sections, another path could theoretically insert for the same key. This is safe in practice because pthread_self() is unique per live thread and a single thread executes sequentially. However, holding the lock across the entire operation (or using try_emplace) would eliminate the gap and simplify reasoning.

Simplified single-lock approach
 LogBufferThreadContext* GetLogBufferThreadContext(std::unordered_map<pthread_t, std::unique_ptr<LogBufferThreadContext>>& log_thread_contexts, std::mutex& log_thread_contexts_lock, uint64_t current_time) {
 	pthread_t tid = pthread_self();
-	{
-		std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
-		auto it = log_thread_contexts.find(tid);
-		if (it != log_thread_contexts.end()) {
-			return it->second.get();
-		}
-	}
-	
-	// Context doesn't exist for this thread, create it with proper initialization
-	auto new_context = std::make_unique<LogBufferThreadContext>();
-	LogBufferThreadContext* ptr = new_context.get();
-	// init() is already called in the constructor, which initializes both events and audit buffers
-	ptr->events.set_last_flush_time(current_time);
-	ptr->audit.set_last_flush_time(current_time);
-	
-	{
-		std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
-		log_thread_contexts[tid] = std::move(new_context);
-	}
-	return ptr;
+	std::lock_guard<std::mutex> lock(log_thread_contexts_lock);
+	auto [it, inserted] = log_thread_contexts.try_emplace(tid, nullptr);
+	if (inserted) {
+		it->second = std::make_unique<LogBufferThreadContext>();
+		it->second->events.set_last_flush_time(current_time);
+		it->second->audit.set_last_flush_time(current_time);
+	}
+	return it->second.get();
 }

109-118: pthread_self() cast to unsigned may truncate on platforms where pthread_t is 64-bit.

On Linux, pthread_t is typically unsigned long (8 bytes on 64-bit), while static_cast<unsigned> truncates to 4 bytes. For RNG seeding purposes this isn't a correctness issue — it just reduces entropy slightly. Fine to leave as-is.

include/log_utils.h (2)

1-2: Reserved identifier used for include guard.

__CLASS_LOG_UTILS_H uses a double-underscore prefix, which is reserved by the C++ standard for implementation use. Consider a non-reserved form like CLASS_LOG_UTILS_H or LOG_UTILS_H_.


199-207: std::function parameters by value incur allocation overhead on every call.

flush_and_rotate takes three std::function<void()> parameters by value. Each call site creates these from lambdas, which may involve heap allocation. Since this is on the hot logging path, consider passing by const reference or using templates to avoid std::function overhead entirely.

Template approach to avoid std::function overhead
template<typename LockFn, typename UnlockFn, typename RotateFn>
bool flush_and_rotate(
	LogBuffer& buffer,
	std::fstream* logfile,
	unsigned int& current_log_size,
	unsigned int max_log_file_size,
	LockFn&& lock_fn,
	UnlockFn&& unlock_fn,
	RotateFn&& rotate_fn,
	uint64_t reset_time = 0);

Alternatively, at minimum pass by const std::function<void()>& to avoid copies.

lib/PgSQL_Logger.cpp (1)

566-573: events_close_log_unlocked doesn't update logfile_open atomic flag.

When the logfile is closed here, set_events_logfile_open(false) is not called. This is currently safe because:

  1. Rotation (via events_flush_log_unlocked) immediately reopens, which sets the flag.
  2. Disabling logging sets events.enabled=false first, so the stale logfile_open=true is masked by the enabled early-return check.

However, this coupling is fragile. If a future code path closes the log without immediately reopening or setting enabled=false, worker threads would pass the is_events_logfile_open() check and buffer data that never gets flushed.

Consider adding set_events_logfile_open(false) inside events_close_log_unlocked (and the audit equivalent) for defense-in-depth.

Proposed fix
 void PgSQL_Logger::events_close_log_unlocked() {
 	if (events.logfile) {
 		events.logfile->flush();
 		events.logfile->close();
 		delete events.logfile;
 		events.logfile=NULL;
+		set_events_logfile_open(false);
 	}
 }

 void PgSQL_Logger::audit_close_log_unlocked() {
 	if (audit.logfile) {
 		audit.logfile->flush();
 		audit.logfile->close();
 		delete audit.logfile;
 		audit.logfile=NULL;
+		set_audit_logfile_open(false);
 	}
 }
lib/MySQL_Logger.cpp (1)

1286-1301: Same close_log_unlocked issue: atomic flag not updated on close.

Same concern as raised for PgSQL_Logger: events_close_log_unlocked and audit_close_log_unlocked set logfile=NULL but don't call set_events_logfile_open(false) / set_audit_logfile_open(false). Currently safe due to the enabled guard but fragile.

Proposed fix
 void MySQL_Logger::events_close_log_unlocked() {
 	if (events.logfile) {
 		events.logfile->flush();
 		events.logfile->close();
 		delete events.logfile;
 		events.logfile=NULL;
+		set_events_logfile_open(false);
 	}
 }

 void MySQL_Logger::audit_close_log_unlocked() {
 	if (audit.logfile) {
 		audit.logfile->flush();
 		audit.logfile->close();
 		delete audit.logfile;
 		audit.logfile=NULL;
+		set_audit_logfile_open(false);
 	}
 }

@renecannao renecannao changed the base branch from v3.0 to v3.0-5243 February 18, 2026 02:26
@renecannao
Copy link
Contributor

Hi @mevishalr . Thank you very much for the PR . I am merging it into a new branch v3.0-5243 to perform some extra change before merging it into v3.0 .

@renecannao renecannao merged commit 568fe15 into sysown:v3.0-5243 Feb 18, 2026
1 of 3 checks passed
renecannao added a commit that referenced this pull request Feb 18, 2026
This commit addresses three correctness regressions introduced by
thread-local log buffering in PR #5364 (query logging performance):

1) stale logfile pointer/UAF race during concurrent rotate/close
2) stale logfile-open state after close
3) non-global flush behavior in admin/format-switch paths

What was fixed

- Make `flush_and_rotate()` consume the current logfile pointer under lock
  - Signature changed from `std::fstream*` to `std::fstream*&`
  - Prevents dereferencing a stale stream pointer captured before lock
    acquisition while another thread rotates/closes the file
  - Updated declaration/definition and all call sites

- Add explicit synchronization for cross-thread buffer draining
  - Added `LogBufferThreadContext::buffer_lock`
  - Any path that appends or flushes a thread buffer now locks this mutex
  - Guarantees force-flush from admin/config paths cannot race with
    worker-thread appends on the same context

- Restore global forced flush semantics where required
  - Extended `MySQL_Logger::flush` and `PgSQL_Logger::flush` to
    `flush(bool force = false)`
  - `force=false`: preserves existing low-overhead worker-loop behavior
    (per-thread timeout-based flush)
  - `force=true`: snapshots all known thread contexts and drains both
    events/audit buffers regardless of timeout
  - `flush_log()` now calls `flush(true)` before file rotation, so admin
    flush and format-switch operations no longer miss pending thread buffers

- Avoid unintended rotation during forced draining
  - In `force=true` path, flush uses `rotate_fn=nullptr`
  - Drains buffered payload into the current file first
  - `flush_log()` then performs one controlled rotate/open step

- Fix logfile state tracking after close
  - `events_close_log_unlocked()` and `audit_close_log_unlocked()` now set
    `logfile_open=false` when the stream is closed
  - Prevents write paths from treating a closed stream as open

- Remove per-thread-context dependency during metadata header write
  - `events_open_log_unlocked()` now uses a local `LogBuffer` for metadata
    emission in format=1 instead of reusing a thread context buffer
  - Keeps open/rotate path independent from worker context lifecycle

- Keep callers consistent and non-duplicative
  - Since `flush_log()` now force-drains internally, removed redundant
    explicit `flush()` calls from:
    - MySQL/PgSQL `eventslog_format` switch handlers
    - `ProxySQL_Admin::flush_logs`

Behavioral outcome

- No stale stream pointer use when close/rotate interleaves with flush
- No false-positive logfile-open state after close
- `FLUSH LOGS` and eventslog format switch now drain all known thread
  buffers before rotating, preventing dropped/misplaced buffered records

Validation

- Built modified objects directly
- Ran full debug build with GENAI enabled:

  make clean && export PROXYSQLGENAI=1 && make debug -j24

  Build completed successfully.
renecannao added a commit that referenced this pull request Feb 20, 2026
v3.0: harden query logging buffering follow-up (#5364 / #5243)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ProxySQL Query logging impacts performance under high RPS

2 participants