Skip to content

Multi-group TAP test runner and ClickHouse test fixes#5470

Open
renecannao wants to merge 45 commits intov3.0from
private/multi-group-runner
Open

Multi-group TAP test runner and ClickHouse test fixes#5470
renecannao wants to merge 45 commits intov3.0from
private/multi-group-runner

Conversation

@renecannao
Copy link
Contributor

@renecannao renecannao commented Mar 21, 2026

Summary

This PR introduces a multi-group TAP test runner for parallel test execution and fixes several failing ClickHouse tests for Docker-based CI infrastructure.

Multi-group Test Runner

  • Add parallel TAP group execution scripts (run-multi-group.bash)
  • Support isolated test runs with per-group infrastructure
  • Add optional code coverage support for isolated test runs
  • Improve test startup staggering with group-based hash

ClickHouse Test Fixes

  • Fix variable name: clickhouse-hostclickhouse-hostname
  • Update MergeTree syntax for newer ClickHouse versions (deprecated old syntax)
  • Fix Docker compatibility: use 0.0.0.0 interface and translate to proxysql hostname in containers
  • Update expected row counts after MergeTree fix

Other Test Fixes

  • Move test_client_limit_error-t to todo-g1 group (network namespace issues)
  • Fix clickhouse_php_conn-t.php for Docker container hostnames
  • Fix max_connections_ff-t to use dynamic hostgroup discovery
  • Fix legacy TAP tests: reg_test_3317 and RESTAPI tests
  • Fix reg_test_3223 RESTAPI test by adding TAP group setup hook

Infrastructure Improvements

  • Add coverage tools (lcov, fastcov) to proxysql-ci-base Docker image
  • Fix various scripts for POSIX sh compatibility
  • Add unique INFRA_ID and debug build instructions

Test plan

  • ClickHouse tests pass: test_clickhouse_server-t (111 tests)
  • ClickHouse tests pass: test_clickhouse_server_libmysql-t (111 tests)
  • Legacy group tests pass with new multi-group runner
  • Code coverage collection works in isolated test runs

Summary by CodeRabbit

  • New Features

    • Run and teardown multiple test groups in parallel with configurable concurrency, per-group summaries, optional auto-cleanup, and combined coverage reporting; optional noise-injection for race testing.
  • Documentation

    • Added quick-start, env var reference, multi-group and coverage guides, updated group READMEs and test analysis notes.
  • Chores

    • New group-level setup/pre-cleanup hooks and deterministic environment loading; per-infra coverage outputs enabled.
  • Tests

    • Numerous TAP test improvements: diagnostics, seeding/cleanup, routing/hostgroup discovery, and harness refinements.

This commit migrates the ai TAP group from its legacy hook-based approach
to the standard test/infra/ Unified CI Infrastructure pattern.

Changes:
- Added generic hook system to infrastructure scripts:
  * ensure-infras.bash: Executes group-specific setup-infras.bash hook
  * run-tests-isolated.bash: Executes group-specific pre-cleanup.bash hook

- Migrated ai group infrastructure:
  * Created infras.lst with infra-mysql84 and docker-pgsql16-single
  * Renamed seed files: mysql-seed.sql -> seed-mysql.sql, pgsql-seed.sql -> seed-pgsql.sql
  * Deleted legacy docker-compose.yml, docker-compose-init.bash, docker-compose-destroy.bash
  * Deleted legacy hooks: pre-proxysql.bash, post-proxysql.bash

- Added group-specific MCP configuration:
  * setup-infras.bash: Configures MCP targets and seeds test data
  * pre-cleanup.bash: Removes MCP configuration after tests
  * mcp-config.sql: SQL template for MCP setup
  * cleanup.sql: SQL template for MCP cleanup
  * seed-mysql.sql & seed-pgsql.sql: Test data for AI tests

- Updated documentation in README.md with usage examples

Architecture:
- ai is the supergroup (defines infrastructure and shared config)
- ai-g1, ai-g2, etc. are subgroups (define test sets in groups.json)
- Group-specific hooks live in test/tap/groups/ai/ directory
- Infrastructure scripts remain generic and delegate to group hooks
Fixes from PR #5461 review:

1. Fix BASE_GROUP derivation inconsistency in run-tests-isolated.bash
   - Changed from bash pattern matching `${TAP_GROUP%%-g[0-9]*}` to sed
   - Now uses same pattern as ensure-infras.bash: `sed -E "s/[-_]g[0-9]+.*//"`
   - Removed redundant BASE_GROUP reassignment

2. Update outdated comments in seed files
   - seed-mysql.sql and seed-pgsql.sql now correctly reference setup-infras.bash

3. Change INSERT OR IGNORE to INSERT OR REPLACE in mcp-config.sql
   - Ensures credentials are updated on reruns for deterministic state

4. Add shellcheck directives to env.sh files
   - Added `# shellcheck shell=bash` to ai/env.sh and ai-g1/env.sh

5. Add explicit validation for derived infrastructure names
   - setup-infras.bash now validates DEFAULT_MYSQL_INFRA and DEFAULT_PGSQL_INFRA
   - Provides clear error messages if infras.lst is misconfigured

6. Add WORKSPACE export to README examples
   - All runnable examples now include `export WORKSPACE=$(pwd)`

Test verification:
- ai-g1 group infrastructure setup works correctly
- legacy-g2 group tests still pass (backward compatibility verified)
The Python tester sources env files using /bin/sh, but the previous
implementation used bash-specific features (BASH_SOURCE[0] and source).

Changed to inline the values from ai/env.sh for POSIX compatibility.
New scripts for running multiple TAP groups in parallel:

- run-multi-group.bash: Orchestrates parallel execution of multiple TAP groups
  * RUN_ID: Links all groups in a single test run (e.g., commit SHA)
  * TAP_GROUPS: Space-separated list of groups to run
  * PARALLEL_JOBS: Limit concurrent groups (default: unlimited)
  * TIMEOUT_MINUTES: Per-group timeout (default: 60)
  * EXIT_ON_FIRST_FAIL: Stop on first failure (default: 0)
  * AUTO_CLEANUP: Destroy successful groups automatically (default: 0)
  * SKIP_CLUSTER_START: Skip ProxySQL cluster init (default: 0)

- destroy-multi-group.bash: Bulk cleanup for a specific RUN_ID
  * Destroys all infrastructures matching *-{RUN_ID}
  * Can target specific groups or auto-discover from log directories

Each group gets fully isolated infrastructure:
- INFRA_ID: {TAP_GROUP}-{RUN_ID}
- Network: {INFRA_ID}_backend
- Per-group logs in ci_infra_logs/{INFRA_ID}/

Fixes #5463
Fixes:
- Fix array parsing issue with bash arithmetic expansion on group names
  Changed from 'read -ra GROUPS' to simple for-loop iteration
  Group names like 'legacy-g1' were being evaluated as arithmetic (1004-114-1=889)

- Fix exit code capture from timeout command
  Changed from 'if ! timeout ...; then exit_code=$?' to proper '|| cmd_exit_code=$?'
  The old code was capturing the if-test exit code (0) instead of the command's

- Change PARALLEL_JOBS default from 0 (unlimited) to 2 for resource safety

Improvements:
- Add random 0-15 second startup delay per group to stagger infrastructure
  initialization and prevent Docker/resource contention when running
  multiple groups in parallel

- Update header documentation with new default values

Tested with RUN_ID="abd123" TAP_GROUPS="legacy-g1 legacy-g2 ai-g1 mysql84-g1"
which ran successfully with staggered startup.
Fix reg_test_3223-restapi_return_codes-t failure by properly configuring
REGULAR_INFRA_DATADIR in group environment files:

1. test/tap/groups/legacy/env.sh: Add REGULAR_INFRA_DATADIR pointing to
test/tap/tests where reg_test_3223_scripts directory exists

2. test/tap/groups/mysql84/env.sh: Same REGULAR_INFRA_DATADIR setting

3. Create env.sh for feature-specific groups:
   - mysql-auto_increment_delay_multiplex=0
   - mysql-multiplexing=false
   - mysql-query_digests=0
   - mysql-query_digests_keep_comment=1

4. test/scripts/bin/proxysql-tester.py: Add fallback logic to load
base group env.sh when subgroup env.sh doesn't exist (e.g., legacy-g1
falls back to legacy). This follows the same pattern as ensure-infras.bash.

This follows the architectural principle that test-specific setup belongs
in TAP group configuration, not in the generic test runner.
Fix reg_test_3223-restapi_return_codes-t failure:

1. Created test/tap/groups/legacy/setup-infras.bash:
   - TAP group hook that copies RESTAPI scripts to ProxySQL data directory
   - Sets execute permissions on all scripts EXCEPT script_no_permissions
   - Keeps script_no_permissions without execute permissions for EACCES test

2. Modified test/tap/groups/legacy/env.sh:
   - Added REGULAR_INFRA_DATADIR=/var/lib/proxysql
   - Points to where scripts are mounted inside ProxySQL container

This follows the architectural principle that test-specific setup belongs
in TAP group hooks, not in the generic test runner or infrastructure scripts.
The scripts were using relative path [ -f .env ] which failed when called from test/infra/control/, causing all SQL template variables (PREFIX, WHG, RHG, etc.) to be empty.

This resulted in query rules with NULL match_digest patterns and incorrect rule IDs.
- Add lcov package for coverage report generation
- Install fastcov via pip3 for faster coverage processing
- Required for COVERAGE=1 mode in multi-group test runner
Coverage collection now runs INSIDE the test-runner container immediately
after tests complete (whether they succeed, fail, or timeout). This ensures:

1. Coverage data is collected even if the parent script times out
2. The .info file is generated inside the container before it exits
3. No dependency on external docker exec after container stops

Also removes the external coverage collection code that was unreachable
when timeout killed the subshell.
1. Add set -e to run-multi-group.bash heredoc for proper error handling
2. Add EXIT trap in run-tests-isolated.bash to collect coverage on timeout/failure
3. Pass COVERAGE_MODE, COVERAGE_REPORT_DIR, and SCRIPT_DIR env vars to container
4. Fix fastcov invocation to use -e deps/ instead of restrictive -i filters
5. Add per-group HTML report generation
6. Run coverage combination in container for host without tools

Coverage is now collected for:
- include/ (189 header files)
- lib/ (109 library files)
- src/ (4 source files)

The EXIT trap ensures coverage is collected even when tests timeout.
Fixes HTML generation errors for missing system headers like
/usr/include/c++/13/bits/predefined_ops.h
- clickhouse_php_conn-t.php: Fix deprecated ClickHouse MergeTree syntax
  Change from deprecated ENGINE=MergeTree(EventDate, (EventDate), 8192)
  to new syntax ENGINE=MergeTree() ORDER BY (EventDate)

- reg_test_3317-lock_hostgroup_special_queries-t.cpp: Add query annotations
  Add /* ;hostgroup=1300 */ hints to SELECT statements to ensure proper
  routing when connection is locked to hostgroup 1300
- Add discover_user_hostgroup() helper function to query default_hostgroup
  from mysql_users table instead of hardcoding hostgroup=0
- Update test_ff_sess_exceeds_max_conns() to use discovered hostgroup
- Update test_ff_only_one_free_conn() to use discovered hostgroup
- Update documentation to reflect dynamic hostgroup discovery
- Add test/README.md with local CI execution instructions

This allows the test to work with any infrastructure configuration
(e.g., isolated CI where sbtest1 uses hostgroup 1300 instead of 0).
- Fix variable name: 'clickhouse-host' -> 'clickhouse-hostname'
- Add quotes around hostname value in SET command
- Handle 0.0.0.0 bind address: use 'proxysql' container hostname in Docker mode,
  or '127.0.0.1' for local execution
This test uses network namespaces which don't work in Docker containers.
Moving to todo-g1 for later review.
…0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8ash

- Added explicit `clickhouse-only` test group for isolated testing
- Added quotes around hostname value in SET statements
- Fixed 0.0.0 Docker.com

 doesn (including quotes around MergeTree syntax)
After updating to new MergeTree syntax, the INSERT with NULL values now succeeds,
resulting in 3 rows instead of 2.
The interface was set to 127.0.0.1 which is not accessible from another
container. Changed to 0.0.0.0 which is replaced by the extract_module_host_port
function with the correct hostname.

Also update expected row count from 2 to 3 due to additional
successful INSERT.
Copilot AI review requested due to automatic review settings March 21, 2026 15:38
@coderabbitai
Copy link

coderabbitai bot commented Mar 21, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds multi-group TAP orchestration (run/destroy), aggregated coverage collection, standardized infra hook/env handling, AI group migration to unified infra hooks, many infra script fixes, expanded TAP test diagnostics/host discovery, and documentation for multi-group runs and coverage.

Changes

Cohort / File(s) Summary
Multi-group Orchestration
test/infra/control/run-multi-group.bash, test/infra/control/destroy-multi-group.bash
New controllers to run and teardown multiple TAP groups concurrently with RUN_ID discovery, parallelism, timeouts, per-group results, optional auto-cleanup, and aggregated reporting/coverage.
Infra Control & Test Runner
test/infra/control/ensure-infras.bash, test/infra/control/run-tests-isolated.bash, test/infra/docker-base/Dockerfile, test/infra/control/start-proxysql-isolated.bash, test/infra/control/env-isolated.bash
ensure-infras: post-config hook, derive DEFAULT_MYSQL_INFRA/DEFAULT_PGSQL_INFRA; run-tests-isolated: coverage trap, BASE_GROUP regex, ProxySQL config dump, pre-cleanup hook; Dockerfile: adds lcov + fastcov; start-proxysql mounts /gcov and sets GCOV env; env-isolated adds TAP_USE_NOISE.
Script .env Sourcing Standardization
test/infra/*/bin/docker-proxy-post.bash (multiple infra dirs)
Proxy-post scripts now compute SCRIPT_DIR and source ../.env deterministically (fixes env resolution).
Orchestrator Credentials
test/infra/infra-mysql57/.../orchestrator.conf.json, test/infra/infra-mysql84/.../orchestrator.conf.json
Updated MySQLTopologyPassword values in several orchestrator JSON configs.
ProxySQL Config Additions
test/infra/infra-mariadb10/conf/proxysql/infra-config.sql
Adds mysql_query_rules deletes/inserts tied to infra marker and saves runtime config.
AI Group — New Hooked Infra Flow
test/tap/groups/ai/setup-infras.bash, test/tap/groups/ai/pre-cleanup.bash, test/tap/groups/ai/mcp-config.sql, test/tap/groups/ai/cleanup.sql, test/tap/groups/ai/seed-*.sql, test/tap/groups/ai-g1/env.sh
Introduces post-setup and pre-cleanup hooks, MCP config/cleanup SQL, seed scripts, and subgroup env to configure MCP/backends via ProxySQL admin.
AI Group — Remove Compose Hooks
test/tap/groups/ai/pre-proxysql.bash (deleted), test/tap/groups/ai/post-proxysql.bash (deleted), test/tap/groups/ai/docker-compose-* (deleted/emptied), test/tap/groups/ai/docker-compose.yml (emptied)
Removes legacy docker-compose-based init/destroy and pre/post-proxysql scripts; AI group moves to unified infra hooks.
AI Group — Env & Docs
test/tap/groups/ai/env.sh, test/tap/groups/ai/infras.lst, test/tap/groups/ai/README.md
Reworks AI env to reference DEFAULT_* infra, adds MCP/profile/hostgroup vars, infras list and README describing unified CI infra flow.
Group Env Standardization
test/tap/groups/*/env.sh, test/tap/groups/legacy/setup-infras.bash, test/tap/groups/todo/infras.lst
Adds DEFAULT_* infra defaults and REGULAR_INFRA_DATADIR exports; adds legacy setup-infras hook for RESTAPI scripts; adds todo marker file.
TAP Harness & Validation
test/scripts/bin/proxysql-tester.py, test/tap/groups/groups.json
proxysql-tester: base-group fallback for env, improved exception logging, post-run expected-test completeness validation; groups.json: adjusted group assignments.
TAP Tests & Helpers
test/tap/tests/* (multiple files)
Multiple test updates: host discovery/refactors, DOCKER_MODE host rewrite, ProxySQL admin dumps, ClickHouse engine/host changes, added diagnostics and routing discovery, TSDB timestamp/downsampling fixes, expectation adjustments.
Analysis & Docs
test/README.md, test/infra/README.md, test_analysis_reg_test_3317.md
Adds top-level test README, infra README updates (multi-group, coverage, noise), and diagnostic analysis for reg_test_3317.
Small env/test tweaks
test/tap/tests/*.env, test/tap/groups/*/env.sh additions/removals
Removed/updated some per-test env vars (ports/passwords), added many per-group env shims for DEFAULT_* and REGULAR_INFRA_DATADIR.

Sequence Diagram(s)

sequenceDiagram
    participant User as rgba(20,120,180,0.5)
    participant Runner as rgba(40,160,60,0.5)
    participant Ensure as rgba(200,140,0,0.5)
    participant Proxy as rgba(180,20,20,0.5)
    participant TC as rgba(120,40,160,0.5)
    participant Cov as rgba(80,80,200,0.5)

    User->>Runner: invoke run-multi-group (RUN_ID, TAP_GROUPS, PARALLEL_JOBS, COVERAGE)
    Runner->>Ensure: start per-group ensure-infras (set INFRA_ID, TAP_GROUP)
    Ensure->>Proxy: apply infra config & run setup-infras hook
    Ensure-->>Runner: infra ready (INFRA_ID exported)
    Runner->>TC: launch run-tests-isolated (proxysql-tester) for group
    TC->>Proxy: dump ProxySQL config (pre-test)
    TC->>Cov: emit coverage artifacts to /gcov (if COVERAGE=1)
    TC-->>Runner: group exit code, logs, timestamps
    Runner->>Cov: collect & combine per-group *.info -> combined report
    Runner-->>User: aggregated pass/fail/timeout summary, coverage path, non-success logs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related issues

Possibly related PRs

Poem

🐰 In carrot code I bounded run IDs bright,
Groups hop, hooks whisper through the night,
Coverage crumbs collected with delight,
Logs and seeds in neat, dependable rows,
Tap tests bloom where unified infra grows.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 64.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Multi-group TAP test runner and ClickHouse test fixes' accurately reflects the main changes: introducing a multi-group test runner and fixing ClickHouse tests.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch private/multi-group-runner

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ProxySQL test suite by introducing a parallel multi-group test runner, which allows for more efficient and isolated execution of tests. It also addresses critical fixes for ClickHouse tests to ensure compatibility with modern versions and Docker environments. Furthermore, the changes improve the overall test infrastructure by adding robust code coverage capabilities and standardizing group-specific setup and cleanup processes, making the testing framework more reliable and maintainable.

Highlights

  • Multi-group TAP Test Runner: Introduced a new multi-group TAP test runner (run-multi-group.bash) to enable parallel execution of test groups, significantly improving test suite efficiency. This runner supports isolated test runs with per-group infrastructure and optional code coverage collection.
  • ClickHouse Test Fixes: Resolved several issues in ClickHouse tests, including correcting a variable name (clickhouse-host to clickhouse-hostname), updating MergeTree syntax for newer ClickHouse versions, and ensuring Docker compatibility by using 0.0.0.0 interface and translating to proxysql hostname within containers.
  • Enhanced Code Coverage Support: Integrated code coverage collection into isolated test runs, utilizing lcov and fastcov tools. The new multi-group runner can combine individual coverage reports into a single, comprehensive report.
  • Improved Test Infrastructure: Updated various scripts for POSIX sh compatibility, added unique INFRA_ID generation, and included coverage tools in the proxysql-ci-base Docker image. The ensure-infras.bash script now includes group-specific setup hooks and post-configuration for already running ProxySQL instances.
  • Refactored max_connections_ff-t Test: The max_connections_ff-t test was refactored to dynamically discover hostgroups, making it more robust and less dependent on hardcoded values. It also includes improved logging and detailed test scenarios.
  • AI TAP Group Migration: The AI TAP group (test/tap/groups/ai) has been migrated to the new Unified CI Infrastructure pattern, removing old docker-compose based setup scripts and introducing new group-specific hooks for MCP configuration and test data seeding.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a multi-group TAP test runner, significantly enhancing parallel test execution and isolated test environments. Key improvements include new bash scripts for managing multi-group test runs, updated infrastructure setup, and enhanced test execution. Fixes for ClickHouse tests, including updated MergeTree syntax and improved Docker compatibility, are also included. The max_connections_ff-t.cpp test benefits from improved documentation and dynamic hostgroup discovery, while reg_test_3317-lock_hostgroup_special_queries-t.cpp is fixed by explicitly hinting the expected hostgroup, with a detailed analysis document explaining the root cause. Code coverage collection is integrated, which is a valuable addition for ensuring code quality. Overall, these changes substantially improve the test suite's capabilities, reliability, and maintainability.

Comment on lines +257 to +291
if [ "${COVERAGE_MODE}" = "1" ]; then
echo ">>> Collecting code coverage data..."

# Check if fastcov is available in the container
if docker exec "${TEST_CONTAINER}" which fastcov >/dev/null 2>&1; then
COVERAGE_INFO_FILE="${COVERAGE_REPORT_DIR}/${TAP_GROUP:-test}-${INFRA_ID}.info"

echo ">>> Generating coverage report: ${COVERAGE_INFO_FILE}"
# Extract coverage data from container
docker exec "${TEST_CONTAINER}" bash -c "
cd '${WORKSPACE}' && \
fastcov -b -j\$(nproc) --process-gcno -l \
-e /usr/include/ -e test/tap/tests \
-d . -i include -i lib -i src \
-o '${COVERAGE_INFO_FILE}' 2>&1
" || echo ">>> WARNING: Coverage generation failed"

# Check if report was generated
if [ -f "${COVERAGE_INFO_FILE}" ]; then
echo ">>> Coverage report generated: ${COVERAGE_INFO_FILE}"
# Generate HTML report
if command -v genhtml >/dev/null 2>&1; then
HTML_REPORT_DIR="${COVERAGE_REPORT_DIR}/html/${TAP_GROUP:-test}-${INFRA_ID}"
mkdir -p "${HTML_REPORT_DIR}"
genhtml --branch-coverage "${COVERAGE_INFO_FILE}" --output-directory "${HTML_REPORT_DIR}" 2>/dev/null || \
echo ">>> WARNING: HTML report generation failed"
[ -d "${HTML_REPORT_DIR}" ] && echo ">>> HTML report: ${HTML_REPORT_DIR}/index.html"
fi
else
echo ">>> WARNING: Coverage info file not generated"
fi
else
echo ">>> WARNING: fastcov not found in container, skipping coverage collection"
fi
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The script attempts to collect code coverage data using docker exec after the main test execution. However, a trap collect_coverage EXIT is already defined within the docker run command (lines 150-180) to handle coverage collection upon the inner shell's exit. This external docker exec call for coverage collection is redundant and likely to fail if the container has already exited or is in the process of being removed. The trap mechanism should be sufficient for collecting coverage data from within the container.

Comment on lines +104 to +106
if "${SCRIPT_DIR}/stop-proxysql-isolated.bash" >/dev/null 2>&1; then
echo " Stopped ProxySQL"
fi

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The stop-proxysql-isolated.bash command is executed without || true. If this command fails, the destroy-multi-group.bash script will exit immediately due to set -e, preventing subsequent cleanup steps for other groups. While destroy-infras.bash has || true, stop-proxysql-isolated.bash should also be allowed to fail gracefully during a best-effort cleanup.

Suggested change
if "${SCRIPT_DIR}/stop-proxysql-isolated.bash" >/dev/null 2>&1; then
echo " Stopped ProxySQL"
fi
if "${SCRIPT_DIR}/stop-proxysql-isolated.bash" >/dev/null 2>&1 || true; then
echo " Stopped ProxySQL"
fi


MYSQL* proxy_admin = mysql_init(NULL);
if (!mysql_real_connect(proxy_admin, cl.admin_host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
if (!mysql_real_connect(proxy_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mysql_real_connect call for the ProxySQL admin interface uses cl.host instead of cl.admin_host. While cl.host might work in some Docker setups where ProxySQL's data interface is also its admin interface, it's generally safer and more explicit to use cl.admin_host for connecting to the admin interface, as defined by the padmin_conn_args in proxysql-tester.py. This ensures the test connects to the intended admin endpoint, which might be different from the data endpoint in more complex configurations.

	if (!mysql_real_connect(proxy_admin, cl.admin_host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the TAP testing infrastructure to support running multiple TAP groups in parallel (with per-group isolated infra), adds optional coverage collection for isolated runs, and fixes several ClickHouse/Docker-related TAP test issues (plus some legacy/RESTAPI setup adjustments).

Changes:

  • Add a parallel multi-group runner (run-multi-group.bash) and multi-group destroy helper.
  • Extend isolated test runner to support coverage collection and improved infra/group hook behavior.
  • Fix/adjust ClickHouse TAP tests and update several TAP group configurations/hooks (legacy + AI group migration to unified infra).

Reviewed changes

Copilot reviewed 43 out of 50 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
test_analysis_reg_test_3317.md Added failure analysis notes for reg_test_3317.
test/tap/tests/test_clickhouse_server-t.cpp ClickHouse variable rename, MergeTree syntax update, interface binding change, expected row count update.
test/tap/tests/reg_test_3317-lock_hostgroup_special_queries-t.cpp Adds hostgroup hint comments and extensive admin-side config dump for debugging.
test/tap/tests/modules_server_test.h Translate 0.0.0.0 bind address into a connectable hostname for module tests.
test/tap/tests/max_connections_ff-t.cpp Improve test robustness via hostgroup discovery + more diagnostics; refactors flow.
test/tap/tests/clickhouse_php_conn-t.php Update MergeTree syntax for newer ClickHouse.
test/tap/groups/todo/infras.lst Introduce a “todo” group infra list placeholder.
test/tap/groups/mysql84/env.sh Add REGULAR_INFRA_DATADIR for RESTAPI scripts.
test/tap/groups/mysql-query_digests_keep_comment=1/env.sh New group env for digest keep-comment=1 runs.
test/tap/groups/mysql-query_digests=0/env.sh New group env for query_digests disabled runs.
test/tap/groups/mysql-multiplexing=false/env.sh New group env for multiplexing disabled runs.
test/tap/groups/mysql-auto_increment_delay_multiplex=0/env.sh New group env for auto_increment_delay_multiplex disabled runs.
test/tap/groups/legacy/setup-infras.bash New legacy group setup hook to stage RESTAPI scripts into ProxySQL container.
test/tap/groups/legacy/env.sh Configure RESTAPI script path for legacy group.
test/tap/groups/groups.json Move test_client_limit_error-t to todo-g1.
test/tap/groups/ai/setup-infras.bash New AI group post-infra hook: configure MCP + seed MySQL/PostgreSQL data.
test/tap/groups/ai/seed-pgsql.sql Document and seed AI-related PostgreSQL tables.
test/tap/groups/ai/seed-mysql.sql Document and seed AI-related MySQL tables; change DB name; flush tables.
test/tap/groups/ai/pre-proxysql.bash Removed old AI local-compose bootstrap hook.
test/tap/groups/ai/pre-cleanup.bash New AI cleanup hook to remove MCP config before teardown.
test/tap/groups/ai/post-proxysql.bash Removed old AI local-compose cleanup hook.
test/tap/groups/ai/mcp-config.sql New SQL template for MCP configuration (envsubst).
test/tap/groups/ai/infras.lst Declare AI group infra dependencies (mysql84 + pgsql16 single).
test/tap/groups/ai/env.sh Rework AI group env to unified-infra defaults and MCP variables.
test/tap/groups/ai/docker-compose.yml Removed local AI backend docker-compose definition.
test/tap/groups/ai/docker-compose-init.bash Removed local AI backend compose init script.
test/tap/groups/ai/docker-compose-destroy.bash Removed local AI backend compose destroy script.
test/tap/groups/ai/cleanup.sql New SQL template for MCP cleanup.
test/tap/groups/ai/README.md Update AI group docs to unified infra + hooks model.
test/tap/groups/ai-g1/env.sh New ai-g1 subgroup env (POSIX sh compatible duplication).
test/scripts/bin/proxysql-tester.py Base-group env fallback; exception logging tweaks; add “expected tests ran” validation.
test/infra/infra-pgsql17-repl/bin/docker-proxy-post.bash Source .env relative to script dir for reliability.
test/infra/infra-mysql84/conf/orchestrator/orc1/orchestrator.conf.json Update orchestrator password placeholder value.
test/infra/infra-mysql84/conf/orchestrator/orc2/orchestrator.conf.json Update orchestrator password placeholder value.
test/infra/infra-mysql84/conf/orchestrator/orc3/orchestrator.conf.json Update orchestrator password placeholder value.
test/infra/infra-mysql84/bin/docker-proxy-post.bash Source .env relative to script dir for reliability.
test/infra/infra-mysql57/conf/orchestrator/orc1/orchestrator.conf.json Update orchestrator password placeholder value.
test/infra/infra-mysql57/conf/orchestrator/orc2/orchestrator.conf.json Update orchestrator password placeholder value.
test/infra/infra-mysql57/conf/orchestrator/orc3/orchestrator.conf.json Update orchestrator password placeholder value.
test/infra/infra-mysql57/bin/docker-proxy-post.bash Source .env relative to script dir for reliability.
test/infra/infra-mysql57-binlog/bin/docker-proxy-post.bash Source .env relative to script dir for reliability.
test/infra/infra-mariadb10/conf/proxysql/infra-config.sql Add query rules setup for infra runs.
test/infra/infra-mariadb10/bin/docker-proxy-post.bash Source .env relative to script dir for reliability.
test/infra/infra-clickhouse23/bin/docker-proxy-post.bash Source .env relative to script dir for reliability.
test/infra/docker-base/Dockerfile Add lcov + install fastcov in CI base image.
test/infra/control/run-tests-isolated.bash Add coverage support, large config dump, subgroup base-group parsing, and group pre-cleanup hook.
test/infra/control/run-multi-group.bash New parallel multi-group TAP runner with timeout and optional coverage combine.
test/infra/control/ensure-infras.bash Run per-infra proxy post-config even if already running; derive defaults; run group setup hook.
test/infra/control/destroy-multi-group.bash New script to destroy all infras for a multi-group RUN_ID.
test/README.md New high-level guide for running tests locally with unified infra.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +226 to +232
if (mysql_query(admin_mysql, "SELECT hostgroup_id, hostname, port, status FROM mysql_servers ORDER BY hostgroup_id, hostname") == 0) {
MYSQL_RES* res = mysql_store_result(admin_mysql);
MYSQL_ROW row;
while ((row = mysql_fetch_row(res))) {
diag("hostgroup_id: %s, hostname: %s, port: %s, status: %s", row[0], row[1], row[2], row[3]);
}
mysql_free_result(res);
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mysql_store_result() can return NULL (OOM, connection lost, or non-result queries). The code unconditionally passes res to mysql_fetch_row() / mysql_num_rows(), which can segfault. Please add res != nullptr checks and log mysql_error() before iterating/freeing.

Copilot uses AI. Check for mistakes.
Comment on lines +145 to +150
"${SCRIPT_DIR}/run-tests-isolated.bash" >> "${log_file}" 2>&1
inner_exit_code=$?
if [ \${inner_exit_code} -ne 0 ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] WARNING: Tests failed with exit code \${inner_exit_code}" | tee -a "${log_file}"
# Coverage is still collected in run-tests-isolated.bash even on failure
exit \${inner_exit_code}
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic attempts to capture run-tests-isolated.bash exit status, but set -e in the heredoc will cause the script to exit immediately on a non-zero test run, so inner_exit_code=$? and the warning path won’t execute. Use an explicit if ! cmd; then ...; exit $?; fi (or temporarily disable -e) so failures are reported correctly.

Suggested change
"${SCRIPT_DIR}/run-tests-isolated.bash" >> "${log_file}" 2>&1
inner_exit_code=$?
if [ \${inner_exit_code} -ne 0 ]; then
echo "[$(date '+%Y-%m-%d %H:%M:%S')] WARNING: Tests failed with exit code \${inner_exit_code}" | tee -a "${log_file}"
# Coverage is still collected in run-tests-isolated.bash even on failure
exit \${inner_exit_code}
if ! "${SCRIPT_DIR}/run-tests-isolated.bash" >> "${log_file}" 2>&1; then
inner_exit_code=$?
echo "[$(date '+%Y-%m-%d %H:%M:%S')] WARNING: Tests failed with exit code ${inner_exit_code}" | tee -a "${log_file}"
# Coverage is still collected in run-tests-isolated.bash even on failure
exit "${inner_exit_code}"

Copilot uses AI. Check for mistakes.
Comment on lines +92 to +100
if (module_host == "0.0.0.0") {
const char* docker_mode = getenv("DOCKER_MODE");
if (docker_mode != nullptr) {
host_port = { "proxysql", i_module_port };
} else {
host_port = { "127.0.0.1", i_module_port };
}
} else {
host_port = { module_host, i_module_port };
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When module_host is 0.0.0.0, this code treats the mere presence of DOCKER_MODE as “Docker mode”. But DOCKER_MODE is set even for non-DNS/localhost runs (e.g. compose), so this will incorrectly rewrite the host to proxysql and break connections outside compose-dns. Check the value (e.g. endswith/contains dns) instead of just != nullptr.

Copilot uses AI. Check for mistakes.
mysql -uradmin -pradmin -hproxysql -P6032 -e 'SELECT hostgroup_id, hostname, port, status, comment FROM mysql_servers ORDER BY hostgroup_id, hostname' 2>/dev/null || echo 'ERROR: Failed to query mysql_servers'

echo '--- mysql_users ---'
mysql -uradmin -pradmin -hproxysql -P6032 -e 'SELECT username, password, active, default_hostgroup, transaction_persistent FROM mysql_users ORDER BY username' 2>/dev/null || echo 'ERROR: Failed to query mysql_users'
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pre-test configuration dump logs mysql_users.password to stdout. Even in CI/test environments this unnecessarily exposes credentials in logs and can leak into artifacts. Consider omitting the password column or masking it.

Suggested change
mysql -uradmin -pradmin -hproxysql -P6032 -e 'SELECT username, password, active, default_hostgroup, transaction_persistent FROM mysql_users ORDER BY username' 2>/dev/null || echo 'ERROR: Failed to query mysql_users'
mysql -uradmin -pradmin -hproxysql -P6032 -e 'SELECT username, "<redacted>" AS password, active, default_hostgroup, transaction_persistent FROM mysql_users ORDER BY username' 2>/dev/null || echo 'ERROR: Failed to query mysql_users'

Copilot uses AI. Check for mistakes.
Comment on lines +255 to +292
# Capture code coverage data if enabled
# This runs regardless of test success/failure
if [ "${COVERAGE_MODE}" = "1" ]; then
echo ">>> Collecting code coverage data..."

# Check if fastcov is available in the container
if docker exec "${TEST_CONTAINER}" which fastcov >/dev/null 2>&1; then
COVERAGE_INFO_FILE="${COVERAGE_REPORT_DIR}/${TAP_GROUP:-test}-${INFRA_ID}.info"

echo ">>> Generating coverage report: ${COVERAGE_INFO_FILE}"
# Extract coverage data from container
docker exec "${TEST_CONTAINER}" bash -c "
cd '${WORKSPACE}' && \
fastcov -b -j\$(nproc) --process-gcno -l \
-e /usr/include/ -e test/tap/tests \
-d . -i include -i lib -i src \
-o '${COVERAGE_INFO_FILE}' 2>&1
" || echo ">>> WARNING: Coverage generation failed"

# Check if report was generated
if [ -f "${COVERAGE_INFO_FILE}" ]; then
echo ">>> Coverage report generated: ${COVERAGE_INFO_FILE}"
# Generate HTML report
if command -v genhtml >/dev/null 2>&1; then
HTML_REPORT_DIR="${COVERAGE_REPORT_DIR}/html/${TAP_GROUP:-test}-${INFRA_ID}"
mkdir -p "${HTML_REPORT_DIR}"
genhtml --branch-coverage "${COVERAGE_INFO_FILE}" --output-directory "${HTML_REPORT_DIR}" 2>/dev/null || \
echo ">>> WARNING: HTML report generation failed"
[ -d "${HTML_REPORT_DIR}" ] && echo ">>> HTML report: ${HTML_REPORT_DIR}/index.html"
fi
else
echo ">>> WARNING: Coverage info file not generated"
fi
else
echo ">>> WARNING: fastcov not found in container, skipping coverage collection"
fi
fi

Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage is collected twice: once via the in-container trap collect_coverage EXIT and again after the container exits via docker exec ... fastcov. This adds significant runtime and can overwrite/confuse artifacts. Consider keeping a single collection path (preferably the in-container trap) and removing the duplicate block.

Suggested change
# Capture code coverage data if enabled
# This runs regardless of test success/failure
if [ "${COVERAGE_MODE}" = "1" ]; then
echo ">>> Collecting code coverage data..."
# Check if fastcov is available in the container
if docker exec "${TEST_CONTAINER}" which fastcov >/dev/null 2>&1; then
COVERAGE_INFO_FILE="${COVERAGE_REPORT_DIR}/${TAP_GROUP:-test}-${INFRA_ID}.info"
echo ">>> Generating coverage report: ${COVERAGE_INFO_FILE}"
# Extract coverage data from container
docker exec "${TEST_CONTAINER}" bash -c "
cd '${WORKSPACE}' && \
fastcov -b -j\$(nproc) --process-gcno -l \
-e /usr/include/ -e test/tap/tests \
-d . -i include -i lib -i src \
-o '${COVERAGE_INFO_FILE}' 2>&1
" || echo ">>> WARNING: Coverage generation failed"
# Check if report was generated
if [ -f "${COVERAGE_INFO_FILE}" ]; then
echo ">>> Coverage report generated: ${COVERAGE_INFO_FILE}"
# Generate HTML report
if command -v genhtml >/dev/null 2>&1; then
HTML_REPORT_DIR="${COVERAGE_REPORT_DIR}/html/${TAP_GROUP:-test}-${INFRA_ID}"
mkdir -p "${HTML_REPORT_DIR}"
genhtml --branch-coverage "${COVERAGE_INFO_FILE}" --output-directory "${HTML_REPORT_DIR}" 2>/dev/null || \
echo ">>> WARNING: HTML report generation failed"
[ -d "${HTML_REPORT_DIR}" ] && echo ">>> HTML report: ${HTML_REPORT_DIR}/index.html"
fi
else
echo ">>> WARNING: Coverage info file not generated"
fi
else
echo ">>> WARNING: fastcov not found in container, skipping coverage collection"
fi
fi
# Coverage collection is handled inside the test-runner container via
# a trap-based collection mechanism. The previous host-side coverage
# collection using 'docker exec ... fastcov' has been removed to avoid
# duplicating work and overwriting/confusing artifacts.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1 @@
# Tests that need review/fix
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

infras.lst is parsed via simple whitespace splitting in the infra scripts; this comment line will be treated as an infrastructure name ("#"), causing ensure-infras/run-tests-isolated to fail if todo(-g1) is ever executed. Either remove comments from this file or update the parsers to ignore #/blank lines.

Copilot uses AI. Check for mistakes.
Comment on lines 704 to 706
MYSQL* proxy_admin = mysql_init(NULL);
if (!mysql_real_connect(proxy_admin, cl.admin_host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
if (!mysql_real_connect(proxy_admin, cl.host, cl.admin_username, cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
fprintf(stderr, "File %s, line %d, Error: %s\n", __FILE__, __LINE__, mysql_error(proxy_admin));
Copy link

Copilot AI Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This connects to the ProxySQL admin port but uses cl.host instead of cl.admin_host. In most tests host and admin_host may differ, so this can make the test connect to the wrong endpoint and fail. Use cl.admin_host here (consistent with other TAP tests).

Copilot uses AI. Check for mistakes.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/tap/tests/max_connections_ff-t.cpp (1)

691-702: ⚠️ Potential issue | 🔴 Critical

Remove the duplicate TAP plan() call.

Line 692 and Line 699 both call plan(...). TAP tests must declare plan once; double-plan can invalidate test output.

Proposed fix
-	plan(
-		1*2 + // 'test_ff_sess_exceeds_max_conns'
-		2*2   // 'test_ff_only_one_free_conn'
-	);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/max_connections_ff-t.cpp` around lines 691 - 702, The test
contains a duplicate TAP plan() invocation: remove the second plan(...) call
(the later block that repeats "plan( 1*2 + // 'test_ff_sess_exceeds_max_conns'
2*2 // 'test_ff_only_one_free_conn' );") so there is only a single plan(...)
declaration; keep the original plan(1 * 2 + 2 * 2); near the top and delete the
repeated plan(...) block to ensure only one TAP plan is declared.
🧹 Nitpick comments (13)
test/tap/tests/max_connections_ff-t.cpp (1)

213-214: Align diagnostic text with executed command.

Line 213 logs LOAD MYSQL VARIABLES TO RUNTIME, but Line 214 executes LOAD MYSQL USERS TO RUNTIME. This can mislead failure triage.

Proposed fix
-	diag("Executing query `%s`...", "LOAD MYSQL VARIABLES TO RUNTIME");
+	diag("Executing query `%s`...", "LOAD MYSQL USERS TO RUNTIME");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/max_connections_ff-t.cpp` around lines 213 - 214, The
diagnostic message logged by diag(...) is mismatched with the actual command
executed by MYSQL_QUERY(proxy_admin, "LOAD MYSQL USERS TO RUNTIME"); — update
the diag call so its format string and logged text exactly match "LOAD MYSQL
USERS TO RUNTIME" (i.e., change the diag invocation that currently prints "LOAD
MYSQL VARIABLES TO RUNTIME" to the same command string) so diagnostic output
aligns with the executed command; locate the diag call and the MYSQL_QUERY call
in the test around the functions diag and MYSQL_QUERY to make the replacement.
test/infra/infra-clickhouse23/bin/docker-proxy-post.bash (1)

25-25: Consider: SQL template path is relative to CWD, not SCRIPT_DIR.

The .env sourcing was updated to use SCRIPT_DIR, but the SQL template path ./conf/proxysql/infra-config.sql remains relative to the current working directory. If this script is always invoked from the infrastructure directory, this is fine. Otherwise, consider using ${SCRIPT_DIR}/../conf/proxysql/infra-config.sql for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/infra-clickhouse23/bin/docker-proxy-post.bash` at line 25, The SQL
template path is resolved relative to the CWD rather than the script location;
update the cat invocation that currently reads ./conf/proxysql/infra-config.sql
(used inside the command substitution $(eval "echo \"$(cat
./conf/proxysql/infra-config.sql)\"") ) to reference the script directory
variable (SCRIPT_DIR) e.g. point it to
${SCRIPT_DIR}/../conf/proxysql/infra-config.sql so the template is loaded
relative to the script location consistently with how the .env is sourced.
test/tap/groups/mysql-multiplexing=false/env.sh (1)

1-8: Optional: Add shellcheck directive to silence SC2148.

This file is intended to be sourced, not executed directly, so the missing shebang is intentional. Adding a shellcheck directive at the top would silence the static analysis warning while documenting the shell compatibility:

🔧 Proposed fix
+# shellcheck shell=bash
 # MySQL multiplexing=false Test Group Environment
 # Defines environment for tests with multiplexing disabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/mysql-multiplexing`=false/env.sh around lines 1 - 8, Add a
ShellCheck directive to the top of the file to silence SC2148 for sourced
scripts (e.g., add a comment such as "shellcheck disable=SC2148" or "shellcheck
shell=sh" as the first line); leave the existing variable definitions
(DEFAULT_MYSQL_INFRA, DEFAULT_PGSQL_INFRA, REGULAR_INFRA_DATADIR) unchanged so
the file is still intended to be sourced while removing the static-analysis
warning.
test/tap/groups/mysql-query_digests_keep_comment=1/env.sh (1)

1-8: Optional: Add shellcheck directive to silence SC2148.

Same as the other group env.sh files - this is sourced, not executed. A shellcheck directive would silence the warning:

🔧 Proposed fix
+# shellcheck shell=bash
 # MySQL query_digests_keep_comment=1 Test Group Environment
 # Defines environment for tests with query digest comment preservation enabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/mysql-query_digests_keep_comment`=1/env.sh around lines 1 -
8, This env.sh is a sourced script and triggers shellcheck SC2148; add a
shellcheck directive at the top of env.sh to silence SC2148 (for example add a
comment like "shellcheck shell=sh" or "shellcheck disable=SC2148") so the linter
recognizes the file is sourced; update the header above the exported symbols
(DEFAULT_MYSQL_INFRA, DEFAULT_PGSQL_INFRA, REGULAR_INFRA_DATADIR) to include
that directive.
test/tap/groups/mysql-query_digests=0/env.sh (1)

1-8: Optional: Add shellcheck directive to silence SC2148.

Consistent with the other new group env.sh files:

🔧 Proposed fix
+# shellcheck shell=bash
 # MySQL query_digests=0 Test Group Environment
 # Defines environment for tests with query digests disabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/mysql-query_digests`=0/env.sh around lines 1 - 8, Add a
shellcheck directive or a shebang to silence SC2148 at the top of this env.sh so
ShellCheck knows the intended shell; specifically, update the top of the file
containing the exports (DEFAULT_MYSQL_INFRA, DEFAULT_PGSQL_INFRA,
REGULAR_INFRA_DATADIR) by adding either a shebang like "#!/usr/bin/env bash" or
the directive "# shellcheck disable=SC2148" as the first non-comment line.
test/tap/groups/ai-g1/env.sh (1)

1-16: Add shell directive for shellcheck compliance.

Same as the other env.sh file, adding a shell directive helps static analysis and documents intent.

Proposed fix
+# shellcheck shell=sh
 # AI-g1 Subgroup Environment Configuration
 # Inherits from parent ai group - duplicated here for POSIX sh compatibility

The configuration values and default pattern using ${VAR:-default} look correct and align with the MCP SQL templates.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/ai-g1/env.sh` around lines 1 - 16, Add the shell directive
comment for shellcheck at the top of the file (before the export lines) to
declare the script shell (e.g., add a "# shellcheck shell=sh" comment as the
first non-blank line); this will make the file consistent with the other env.sh
and satisfy static analysis while leaving existing exports like
DEFAULT_MYSQL_INFRA, TAP_MCPPORT, MCP_TARGET_ID, MYSQL_DATABASE and
PGSQL_DATABASE unchanged.
test/infra/control/ensure-infras.bash (1)

77-83: Consider logging post-configuration failures instead of silently suppressing.

Using || true suppresses all errors from docker-proxy-post.bash, which could hide real configuration issues when backends are already running.

Proposed fix: Log failures but continue
         # Run proxy post-configuration to ensure query rules are loaded
         if [ -f "${INFRA_DIR}/bin/docker-proxy-post.bash" ]; then
             echo ">>> Ensuring ProxySQL configuration for '${INFRA_NAME}'..."
             cd "${INFRA_DIR}"
-            ./bin/docker-proxy-post.bash || true
+            if ! ./bin/docker-proxy-post.bash; then
+                echo ">>> WARNING: Post-configuration for '${INFRA_NAME}' failed (non-fatal)"
+            fi
             cd - >/dev/null
         fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/ensure-infras.bash` around lines 77 - 83, The
post-configuration invocation currently suppresses all errors with "|| true",
hiding failures from ./bin/docker-proxy-post.bash; change this to capture the
exit status after running ./bin/docker-proxy-post.bash, and if non-zero emit a
clear error log including INFRA_NAME and the exit code (e.g., via echo to
stderr) but do not abort—allow the script to continue; locate the block that
references INFRA_DIR, INFRA_NAME and ./bin/docker-proxy-post.bash and replace
the unconditional suppression with the described exit-code check and logging.
test_analysis_reg_test_3317.md (2)

11-45: Add language specifiers to fenced code blocks for better rendering.

Several code blocks lack language specifiers, which affects syntax highlighting and accessibility.

Proposed fix for code block at line 11
 ### mysql_servers
-```
+```text
 hostgroup_id  hostname                    port  status

Apply similar changes to code blocks at lines 26, 34, 41, 53, and 69.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test_analysis_reg_test_3317.md` around lines 11 - 45, Summary: Several fenced
code blocks (the hostgroup table showing hostgroup_id/hostname/port/status, the
mysql_users block for 'testuser', the mysql_query_rules empty table block, and
the Global Variables block showing
mysql-set_query_lock_on_hostgroup/pgsql-set_query_lock_on_hostgroup) lack
language specifiers; update each opening triple-backtick to include a language
(e.g., ```text) so syntax highlighting and rendering improve. Locate the blocks
by the unique snippets "hostgroup_id  hostname", the "mysql_users" table with
username/default_hostgroup/transaction_persistent, the "mysql_query_rules" empty
table block, and the Global Variables block containing
"mysql-set_query_lock_on_hostgroup" and "pgsql-set_query_lock_on_hostgroup" and
change their fences to use ```text (or another appropriate language) for each
fenced code block.

1-172: Consider moving this analysis file to a more appropriate location.

This detailed test failure analysis is placed in the repository root. Consider moving it to test/docs/ or test/tap/analysis/ for better organization, or if this is temporary debugging documentation, consider whether it should be committed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test_analysis_reg_test_3317.md` around lines 1 - 172, Move the analysis file
test_analysis_reg_test_3317.md out of the repository root into a proper
docs/tests location (e.g., test/docs/ or test/tap/analysis/) so repository root
stays clean; update any references to test_analysis_reg_test_3317.md in the repo
(tests, CI configs or README) to the new path, and if this file is temporary
debugging output consider adding it to .gitignore or removing it from the
commit.
test/tap/groups/mysql-auto_increment_delay_multiplex=0/env.sh (1)

1-8: Add shell directive for shellcheck compliance.

While this file is sourced rather than executed, adding a shell directive helps static analysis tools and documents the expected shell compatibility.

Proposed fix
+# shellcheck shell=sh
 # MySQL auto_increment_delay_multiplex=0 Test Group Environment
 # Defines environment for tests with auto_increment_delay_multiplex disabled.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/mysql-auto_increment_delay_multiplex`=0/env.sh around lines 1
- 8, Add the ShellCheck directive at the top of the file (before the existing
comments) to declare the intended shell for static analysis; for example insert
a line like "# shellcheck shell=bash". This change is small and should be
applied in the file that sets DEFAULT_MYSQL_INFRA, DEFAULT_PGSQL_INFRA and
REGULAR_INFRA_DATADIR so shellcheck understands the expected shell when linting
these variables and the use of WORKSPACE.
test/infra/control/run-multi-group.bash (1)

76-81: Unused associative arrays can be removed.

START_TIMES, END_TIMES, and PID_FOR_GROUP (line 221) are declared but never used.

Remove unused declarations
 # Arrays to track job PIDs and their associated groups
 declare -a JOB_PIDS=()
 declare -A GROUP_FOR_PID
-declare -A PID_FOR_GROUP
 declare -A EXIT_CODES
-declare -A START_TIMES
-declare -A END_TIMES

Also remove line 221: PID_FOR_GROUP["${group}"]="${local_pid}"

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-multi-group.bash` around lines 76 - 81, Remove the
unused associative arrays START_TIMES, END_TIMES, and PID_FOR_GROUP from the top
declarations (they are never read), and also remove the assignment
PID_FOR_GROUP["${group}"]="${local_pid}" where the PID_FOR_GROUP map is set;
keep JOB_PIDS, GROUP_FOR_PID, and EXIT_CODES intact. This cleans up unused
state: delete the declare -A START_TIMES, declare -A END_TIMES, declare -A
PID_FOR_GROUP lines and remove the PID_FOR_GROUP["${group}"]="${local_pid}"
statement.
test/infra/control/destroy-multi-group.bash (1)

90-90: Unused associative array DESTROY_RESULTS.

The array is populated on lines 111 and 115 but never read. Either use it in the summary or remove it.

Remove unused array
 # Track results
-declare -A DESTROY_RESULTS
 SUCCESS_COUNT=0
 FAIL_COUNT=0

And remove the assignments on lines 111 and 115.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/destroy-multi-group.bash` at line 90, The associative
array DESTROY_RESULTS is declared but never used; either remove its declaration
(the DESTROY_RESULTS declare -A) and the two places where it's assigned, or use
it in the script summary output; specifically remove or keep references to
DESTROY_RESULTS and the assignments that set DESTROY_RESULTS[...] (the two
assignments currently populating it) and, if keeping it, consume its values when
printing the final summary so the stored results are actually read.
test/tap/tests/reg_test_3317-lock_hostgroup_special_queries-t.cpp (1)

255-255: Minor type mismatch: mysql_num_rows returns my_ulonglong.

Assigning to int could truncate on extremely large result sets, though this is not a practical concern for admin table queries.

Optional fix
-		int num_rows = mysql_num_rows(res);
+		my_ulonglong num_rows = mysql_num_rows(res);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/reg_test_3317-lock_hostgroup_special_queries-t.cpp` at line
255, The variable storing the result of mysql_num_rows is declared as int
(num_rows) but mysql_num_rows returns my_ulonglong; change the type of num_rows
to my_ulonglong (or unsigned long long) and update any uses accordingly (e.g.,
loops or comparisons that expect int) so the return value from mysql_num_rows is
stored without potential truncation; locate the assignment to
mysql_num_rows(res) and replace the int declaration with my_ulonglong (or
unsigned long long) in the same scope.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/infra/control/run-multi-group.bash`:
- Line 108: The single-line declaration+assignment using local group_hash=$(echo
-n "${group}" | cksum | cut -d' ' -f1) masks failures from cksum; change this to
declare the variable first (local group_hash) then run the pipeline into a
separate assignment command, check the pipeline's exit status (or test $?
immediately) and handle errors (exit or log) if cksum fails before assigning to
group_hash so failures aren't silently ignored—apply this to the code that
computes group_hash from cksum.

In `@test/infra/control/run-tests-isolated.bash`:
- Around line 255-291: The host-side coverage collection after docker run must
be removed or disabled because docker run blocks until the container exits and
the container's EXIT trap already collects coverage; remove or skip the
host-side block that uses docker exec to call fastcov and genhtml (the section
that checks COVERAGE_MODE and runs docker exec to create
COVERAGE_INFO_FILE/TAP_GROUP-${INFRA_ID}.info), or alternatively change the
approach to run docker exec while the container is running; also ensure the
filename/variable mismatch between the in-container trap (which writes
${COVERAGE_REPORT_DIR}/${INFRA_ID}.info) and the host-side writer
(${COVERAGE_REPORT_DIR}/${TAP_GROUP:-test}-${INFRA_ID}.info) is reconciled so
only one writer/path is used (use COVERAGE_REPORT_DIR, INFRA_ID and TAP_GROUP
consistently).

In `@test/infra/infra-mysql57-binlog/bin/docker-proxy-post.bash`:
- Around line 4-5: The SQL file path is still cwd-relative and can break when
the script runs from another directory; update the script to use the existing
SCRIPT_DIR variable for loading the SQL template instead of a relative path.
Locate the SQL load/execute call (the line that references the SQL file around
where SQL is sourced or passed to mysql) and prepend or join SCRIPT_DIR (used in
the SCRIPT_DIR="$(cd ...)" assignment) so the path becomes script-relative
(e.g., "${SCRIPT_DIR}/...") ensuring the .env sourcing and SQL template use the
same script-rooted resolution.

In `@test/infra/infra-mysql57/bin/docker-proxy-post.bash`:
- Around line 4-5: The script currently resolves resources relative to the
current working directory, causing nondeterministic SQL template loading; modify
uses of ./conf/... (and any similar relative paths on lines 10-12) to use the
computed SCRIPT_DIR variable instead (e.g., replace instances of ./conf/... with
"${SCRIPT_DIR}/../conf/..." or appropriate path based on SCRIPT_DIR) so the
script always loads templates relative to its own location; update any other
file references in this script that rely on CWD to use SCRIPT_DIR as well.

In `@test/infra/infra-mysql84/bin/docker-proxy-post.bash`:
- Around line 4-5: The script sets SCRIPT_DIR but still uses a cwd-relative SQL
include (e.g. './conf/...'); update the SQL include to reference SCRIPT_DIR so
the path is deterministic—replace occurrences of './conf/...' with
"${SCRIPT_DIR}/../conf/..." (or use "$(realpath "${SCRIPT_DIR}/../conf/...")")
and keep proper quoting and an existence check before using the file so invoking
docker-proxy-post.bash from any CWD will succeed.

In `@test/infra/infra-pgsql17-repl/bin/docker-proxy-post.bash`:
- Around line 4-5: The SQL template path is still referenced relative to the
current working directory; update the script to build that path off the existing
SCRIPT_DIR variable so the template is always found when the script is run from
anywhere. Locate the code that references the SQL template (the variable or
command that uses the template filename) and prepend/resolve it with
"${SCRIPT_DIR}/..." instead of a plain relative path so the script consistently
uses SCRIPT_DIR as its base.

In `@test/tap/groups/ai/env.sh`:
- Around line 5-6: The script currently unconditionally overwrites
DEFAULT_MYSQL_INFRA and DEFAULT_PGSQL_INFRA; change the exports in env.sh to
preserve caller/CI-provided values by only assigning the defaults when the
variables are unset or empty (use POSIX shell parameter expansion or a
conditional test like checking -z before setting) so DEFAULT_MYSQL_INFRA and
DEFAULT_PGSQL_INFRA remain whatever the caller exported unless absent, then
export the default values.

In `@test/tap/groups/ai/mcp-config.sql`:
- Around line 15-17: The SQL injection-risk comes from directly substituting
environment variables into SQL (e.g., the VALUES lines inserting
${TAP_MYSQLHOST}, ${TAP_MYSQLPORT}, ${MCP_MYSQL_HOSTGROUP_ID} and other database
name vars); update the test generation to either escape/validate these env
values before substitution or switch to parameterized/place-holder substitution
that performs SQL-escaping (for example add an escaping step for
MYSQL_DATABASE/PGSQL_DATABASE and host variables or validate they match a safe
regex like alphanumerics/allowed punctuation) so that the final INSERT INTO
mysql_servers and VALUES(...) lines receive properly escaped/validated strings
rather than raw env input.

In `@test/tap/groups/ai/pre-cleanup.bash`:
- Line 22: Compute the password in a separate step and check the pipeline exit
status before exporting: capture the output of the sha256sum/head pipeline into
a local variable (e.g., tmp_pass=$(echo -n "${INFRA_ID}" | sha256sum | head -c
10)), verify the pipeline succeeded (check $? or use set -o pipefail and test
that tmp_pass is non-empty), and only then export ROOT_PASSWORD="$tmp_pass";
this avoids masking failures of sha256sum/head and ensures ROOT_PASSWORD is not
set on error.

In `@test/tap/groups/ai/setup-infras.bash`:
- Line 48: The current line exports ROOT_PASSWORD by running a command
substitution inline which can mask exit codes; instead, split declaration and
assignment: first assign the output of the sha256sum pipeline to the variable
ROOT_PASSWORD using command substitution (referencing INFRA_ID and the sha256sum
| head -c 10 pipeline), then export ROOT_PASSWORD on a separate line so the
pipeline's return value isn't masked by the export builtin.

In `@test/tap/groups/groups.json`:
- Line 142: The test "test_client_limit_error-t" has been moved into the todo
group "todo-g1" but CI currently hides todo groups from summaries; update CI
reporting or documentation so these todos are surfaced as known skips: modify
the TAP reporting step (or CI summary script) to include counts or a section for
todo groups (e.g., "todo-g1") and ensure the test name
"test_client_limit_error-t" appears under a "Known Skips / Todo" section, or add
a CI log line that explicitly emits a skip message for
"test_client_limit_error-t"; also document in the TAP test guide how todo groups
are treated so maintainers know these tests are intentionally skipped due to
privileged requirements.

In `@test/tap/groups/todo/infras.lst`:
- Line 1: Remove the leading-comment line "# Tests that need review/fix" from
the file infras.lst so the parser doesn't treat it as an infrastructure
identifier; either delete that line or replace it with valid infrastructure
names (one per line) and ensure no lines start with '#' so the provisioning
script can locate real directories instead of "test/infra/# Tests that need
review/fix".

In `@test/tap/tests/max_connections_ff-t.cpp`:
- Line 705: The admin connection is using the wrong host variable; in the
mysql_real_connect call that constructs the admin connection (proxy_admin,
cl.admin_username, cl.admin_password, cl.admin_port) replace cl.host with
cl.admin_host so the admin endpoint uses the configured admin host
(cl.admin_host) rather than the regular host (cl.host).

---

Outside diff comments:
In `@test/tap/tests/max_connections_ff-t.cpp`:
- Around line 691-702: The test contains a duplicate TAP plan() invocation:
remove the second plan(...) call (the later block that repeats "plan( 1*2 + //
'test_ff_sess_exceeds_max_conns' 2*2 // 'test_ff_only_one_free_conn' );") so
there is only a single plan(...) declaration; keep the original plan(1 * 2 + 2 *
2); near the top and delete the repeated plan(...) block to ensure only one TAP
plan is declared.

---

Nitpick comments:
In `@test_analysis_reg_test_3317.md`:
- Around line 11-45: Summary: Several fenced code blocks (the hostgroup table
showing hostgroup_id/hostname/port/status, the mysql_users block for 'testuser',
the mysql_query_rules empty table block, and the Global Variables block showing
mysql-set_query_lock_on_hostgroup/pgsql-set_query_lock_on_hostgroup) lack
language specifiers; update each opening triple-backtick to include a language
(e.g., ```text) so syntax highlighting and rendering improve. Locate the blocks
by the unique snippets "hostgroup_id  hostname", the "mysql_users" table with
username/default_hostgroup/transaction_persistent, the "mysql_query_rules" empty
table block, and the Global Variables block containing
"mysql-set_query_lock_on_hostgroup" and "pgsql-set_query_lock_on_hostgroup" and
change their fences to use ```text (or another appropriate language) for each
fenced code block.
- Around line 1-172: Move the analysis file test_analysis_reg_test_3317.md out
of the repository root into a proper docs/tests location (e.g., test/docs/ or
test/tap/analysis/) so repository root stays clean; update any references to
test_analysis_reg_test_3317.md in the repo (tests, CI configs or README) to the
new path, and if this file is temporary debugging output consider adding it to
.gitignore or removing it from the commit.

In `@test/infra/control/destroy-multi-group.bash`:
- Line 90: The associative array DESTROY_RESULTS is declared but never used;
either remove its declaration (the DESTROY_RESULTS declare -A) and the two
places where it's assigned, or use it in the script summary output; specifically
remove or keep references to DESTROY_RESULTS and the assignments that set
DESTROY_RESULTS[...] (the two assignments currently populating it) and, if
keeping it, consume its values when printing the final summary so the stored
results are actually read.

In `@test/infra/control/ensure-infras.bash`:
- Around line 77-83: The post-configuration invocation currently suppresses all
errors with "|| true", hiding failures from ./bin/docker-proxy-post.bash; change
this to capture the exit status after running ./bin/docker-proxy-post.bash, and
if non-zero emit a clear error log including INFRA_NAME and the exit code (e.g.,
via echo to stderr) but do not abort—allow the script to continue; locate the
block that references INFRA_DIR, INFRA_NAME and ./bin/docker-proxy-post.bash and
replace the unconditional suppression with the described exit-code check and
logging.

In `@test/infra/control/run-multi-group.bash`:
- Around line 76-81: Remove the unused associative arrays START_TIMES,
END_TIMES, and PID_FOR_GROUP from the top declarations (they are never read),
and also remove the assignment PID_FOR_GROUP["${group}"]="${local_pid}" where
the PID_FOR_GROUP map is set; keep JOB_PIDS, GROUP_FOR_PID, and EXIT_CODES
intact. This cleans up unused state: delete the declare -A START_TIMES, declare
-A END_TIMES, declare -A PID_FOR_GROUP lines and remove the
PID_FOR_GROUP["${group}"]="${local_pid}" statement.

In `@test/infra/infra-clickhouse23/bin/docker-proxy-post.bash`:
- Line 25: The SQL template path is resolved relative to the CWD rather than the
script location; update the cat invocation that currently reads
./conf/proxysql/infra-config.sql (used inside the command substitution $(eval
"echo \"$(cat ./conf/proxysql/infra-config.sql)\"") ) to reference the script
directory variable (SCRIPT_DIR) e.g. point it to
${SCRIPT_DIR}/../conf/proxysql/infra-config.sql so the template is loaded
relative to the script location consistently with how the .env is sourced.

In `@test/tap/groups/ai-g1/env.sh`:
- Around line 1-16: Add the shell directive comment for shellcheck at the top of
the file (before the export lines) to declare the script shell (e.g., add a "#
shellcheck shell=sh" comment as the first non-blank line); this will make the
file consistent with the other env.sh and satisfy static analysis while leaving
existing exports like DEFAULT_MYSQL_INFRA, TAP_MCPPORT, MCP_TARGET_ID,
MYSQL_DATABASE and PGSQL_DATABASE unchanged.

In `@test/tap/groups/mysql-auto_increment_delay_multiplex`=0/env.sh:
- Around line 1-8: Add the ShellCheck directive at the top of the file (before
the existing comments) to declare the intended shell for static analysis; for
example insert a line like "# shellcheck shell=bash". This change is small and
should be applied in the file that sets DEFAULT_MYSQL_INFRA, DEFAULT_PGSQL_INFRA
and REGULAR_INFRA_DATADIR so shellcheck understands the expected shell when
linting these variables and the use of WORKSPACE.

In `@test/tap/groups/mysql-multiplexing`=false/env.sh:
- Around line 1-8: Add a ShellCheck directive to the top of the file to silence
SC2148 for sourced scripts (e.g., add a comment such as "shellcheck
disable=SC2148" or "shellcheck shell=sh" as the first line); leave the existing
variable definitions (DEFAULT_MYSQL_INFRA, DEFAULT_PGSQL_INFRA,
REGULAR_INFRA_DATADIR) unchanged so the file is still intended to be sourced
while removing the static-analysis warning.

In `@test/tap/groups/mysql-query_digests_keep_comment`=1/env.sh:
- Around line 1-8: This env.sh is a sourced script and triggers shellcheck
SC2148; add a shellcheck directive at the top of env.sh to silence SC2148 (for
example add a comment like "shellcheck shell=sh" or "shellcheck disable=SC2148")
so the linter recognizes the file is sourced; update the header above the
exported symbols (DEFAULT_MYSQL_INFRA, DEFAULT_PGSQL_INFRA,
REGULAR_INFRA_DATADIR) to include that directive.

In `@test/tap/groups/mysql-query_digests`=0/env.sh:
- Around line 1-8: Add a shellcheck directive or a shebang to silence SC2148 at
the top of this env.sh so ShellCheck knows the intended shell; specifically,
update the top of the file containing the exports (DEFAULT_MYSQL_INFRA,
DEFAULT_PGSQL_INFRA, REGULAR_INFRA_DATADIR) by adding either a shebang like
"#!/usr/bin/env bash" or the directive "# shellcheck disable=SC2148" as the
first non-comment line.

In `@test/tap/tests/max_connections_ff-t.cpp`:
- Around line 213-214: The diagnostic message logged by diag(...) is mismatched
with the actual command executed by MYSQL_QUERY(proxy_admin, "LOAD MYSQL USERS
TO RUNTIME"); — update the diag call so its format string and logged text
exactly match "LOAD MYSQL USERS TO RUNTIME" (i.e., change the diag invocation
that currently prints "LOAD MYSQL VARIABLES TO RUNTIME" to the same command
string) so diagnostic output aligns with the executed command; locate the diag
call and the MYSQL_QUERY call in the test around the functions diag and
MYSQL_QUERY to make the replacement.

In `@test/tap/tests/reg_test_3317-lock_hostgroup_special_queries-t.cpp`:
- Line 255: The variable storing the result of mysql_num_rows is declared as int
(num_rows) but mysql_num_rows returns my_ulonglong; change the type of num_rows
to my_ulonglong (or unsigned long long) and update any uses accordingly (e.g.,
loops or comparisons that expect int) so the return value from mysql_num_rows is
stored without potential truncation; locate the assignment to
mysql_num_rows(res) and replace the int declaration with my_ulonglong (or
unsigned long long) in the same scope.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b9f3ee1b-3772-46e1-a370-f1610a57bc8f

📥 Commits

Reviewing files that changed from the base of the PR and between 0e955cd and 179644f.

📒 Files selected for processing (50)
  • test/README.md
  • test/infra/control/destroy-multi-group.bash
  • test/infra/control/ensure-infras.bash
  • test/infra/control/run-multi-group.bash
  • test/infra/control/run-tests-isolated.bash
  • test/infra/docker-base/Dockerfile
  • test/infra/infra-clickhouse23/bin/docker-proxy-post.bash
  • test/infra/infra-mariadb10/bin/docker-proxy-post.bash
  • test/infra/infra-mariadb10/conf/proxysql/infra-config.sql
  • test/infra/infra-mysql57-binlog/bin/docker-proxy-post.bash
  • test/infra/infra-mysql57/bin/docker-proxy-post.bash
  • test/infra/infra-mysql57/conf/orchestrator/orc1/orchestrator.conf.json
  • test/infra/infra-mysql57/conf/orchestrator/orc2/orchestrator.conf.json
  • test/infra/infra-mysql57/conf/orchestrator/orc3/orchestrator.conf.json
  • test/infra/infra-mysql84/bin/docker-proxy-post.bash
  • test/infra/infra-mysql84/conf/orchestrator/orc1/orchestrator.conf.json
  • test/infra/infra-mysql84/conf/orchestrator/orc2/orchestrator.conf.json
  • test/infra/infra-mysql84/conf/orchestrator/orc3/orchestrator.conf.json
  • test/infra/infra-pgsql17-repl/bin/docker-proxy-post.bash
  • test/scripts/bin/proxysql-tester.py
  • test/tap/groups/ai-g1/env.sh
  • test/tap/groups/ai/README.md
  • test/tap/groups/ai/cleanup.sql
  • test/tap/groups/ai/docker-compose-destroy.bash
  • test/tap/groups/ai/docker-compose-init.bash
  • test/tap/groups/ai/docker-compose.yml
  • test/tap/groups/ai/env.sh
  • test/tap/groups/ai/infras.lst
  • test/tap/groups/ai/mcp-config.sql
  • test/tap/groups/ai/post-proxysql.bash
  • test/tap/groups/ai/pre-cleanup.bash
  • test/tap/groups/ai/pre-proxysql.bash
  • test/tap/groups/ai/seed-mysql.sql
  • test/tap/groups/ai/seed-pgsql.sql
  • test/tap/groups/ai/setup-infras.bash
  • test/tap/groups/groups.json
  • test/tap/groups/legacy/env.sh
  • test/tap/groups/legacy/setup-infras.bash
  • test/tap/groups/mysql-auto_increment_delay_multiplex=0/env.sh
  • test/tap/groups/mysql-multiplexing=false/env.sh
  • test/tap/groups/mysql-query_digests=0/env.sh
  • test/tap/groups/mysql-query_digests_keep_comment=1/env.sh
  • test/tap/groups/mysql84/env.sh
  • test/tap/groups/todo/infras.lst
  • test/tap/tests/clickhouse_php_conn-t.php
  • test/tap/tests/max_connections_ff-t.cpp
  • test/tap/tests/modules_server_test.h
  • test/tap/tests/reg_test_3317-lock_hostgroup_special_queries-t.cpp
  • test/tap/tests/test_clickhouse_server-t.cpp
  • test_analysis_reg_test_3317.md
💤 Files with no reviewable changes (5)
  • test/tap/groups/ai/docker-compose.yml
  • test/tap/groups/ai/docker-compose-destroy.bash
  • test/tap/groups/ai/docker-compose-init.bash
  • test/tap/groups/ai/post-proxysql.bash
  • test/tap/groups/ai/pre-proxysql.bash

Comment on lines +4 to +5
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
[ -f "${SCRIPT_DIR}/../.env" ] && . "${SCRIPT_DIR}/../.env"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same path-hardening gap remains for SQL template loading.

Lines 4-5 are script-relative, but the SQL file on Line 12 is still cwd-relative. This can fail when the script is launched from a different directory.

Suggested fix
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 [ -f "${SCRIPT_DIR}/../.env" ] && . "${SCRIPT_DIR}/../.env"
+INFRA_SQL_TEMPLATE="$(cat "${SCRIPT_DIR}/../conf/proxysql/infra-config.sql")"
@@
-docker exec -i "${PROXY_CONTAINER}" mysql -uradmin -pradmin -h127.0.0.1 -P6032 <<SQL
-$(eval "echo \"$(cat ./conf/proxysql/infra-config.sql)\"")
+docker exec -i "${PROXY_CONTAINER}" mysql -uradmin -pradmin -h127.0.0.1 -P6032 <<SQL
+$(eval "echo \"${INFRA_SQL_TEMPLATE}\"")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/infra-mysql57-binlog/bin/docker-proxy-post.bash` around lines 4 -
5, The SQL file path is still cwd-relative and can break when the script runs
from another directory; update the script to use the existing SCRIPT_DIR
variable for loading the SQL template instead of a relative path. Locate the SQL
load/execute call (the line that references the SQL file around where SQL is
sourced or passed to mysql) and prepend or join SCRIPT_DIR (used in the
SCRIPT_DIR="$(cd ...)" assignment) so the path becomes script-relative (e.g.,
"${SCRIPT_DIR}/...") ensuring the .env sourcing and SQL template use the same
script-rooted resolution.

Comment on lines +4 to +5
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
[ -f "${SCRIPT_DIR}/../.env" ] && . "${SCRIPT_DIR}/../.env"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make SQL template loading deterministic, not CWD-dependent.

Line 11 still depends on ./conf/.... This can break when the script is invoked from another working directory.

Proposed fix
+SQL_TEMPLATE="$(cat "${SCRIPT_DIR}/../conf/proxysql/infra-config.sql")"
+SQL_RENDERED="$(eval "echo \"${SQL_TEMPLATE}\"")"
 docker exec -i "${PROXY_CONTAINER}" mysql -uadmin -padmin -h127.0.0.1 -P6032 <<SQL
-$(eval "echo \"$(cat ./conf/proxysql/infra-config.sql)\"")
+${SQL_RENDERED}

Also applies to: 10-12

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/infra-mysql57/bin/docker-proxy-post.bash` around lines 4 - 5, The
script currently resolves resources relative to the current working directory,
causing nondeterministic SQL template loading; modify uses of ./conf/... (and
any similar relative paths on lines 10-12) to use the computed SCRIPT_DIR
variable instead (e.g., replace instances of ./conf/... with
"${SCRIPT_DIR}/../conf/..." or appropriate path based on SCRIPT_DIR) so the
script always loads templates relative to its own location; update any other
file references in this script that rely on CWD to use SCRIPT_DIR as well.

Comment on lines +4 to +5
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
[ -f "${SCRIPT_DIR}/../.env" ] && . "${SCRIPT_DIR}/../.env"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Path resolution is only partially fixed.

Lines 4-5 make .env deterministic, but the SQL include still uses a cwd-relative path (./conf/... on Line 12). Invoking this script outside its infra directory can still break.

Suggested fix
 SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 [ -f "${SCRIPT_DIR}/../.env" ] && . "${SCRIPT_DIR}/../.env"
+INFRA_SQL_TEMPLATE="$(cat "${SCRIPT_DIR}/../conf/proxysql/infra-config.sql")"
 PROXY_CONTAINER="proxysql.${INFRA_ID}"
@@
-docker exec -i "${PROXY_CONTAINER}" mysql -uadmin -padmin -h127.0.0.1 -P6032 <<SQL
-$(eval "echo \"$(cat ./conf/proxysql/infra-config.sql)\"")
+docker exec -i "${PROXY_CONTAINER}" mysql -uadmin -padmin -h127.0.0.1 -P6032 <<SQL
+$(eval "echo \"${INFRA_SQL_TEMPLATE}\"")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/infra-mysql84/bin/docker-proxy-post.bash` around lines 4 - 5, The
script sets SCRIPT_DIR but still uses a cwd-relative SQL include (e.g.
'./conf/...'); update the SQL include to reference SCRIPT_DIR so the path is
deterministic—replace occurrences of './conf/...' with
"${SCRIPT_DIR}/../conf/..." (or use "$(realpath "${SCRIPT_DIR}/../conf/...")")
and keep proper quoting and an existence check before using the file so invoking
docker-proxy-post.bash from any CWD will succeed.

exit 0
fi

export ROOT_PASSWORD=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Declare and assign separately to avoid masking return values.

If sha256sum fails, the error is masked and ROOT_PASSWORD is set to an empty or partial value.

Proposed fix
-export ROOT_PASSWORD=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
+ROOT_PASSWORD_TMP=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
+export ROOT_PASSWORD="${ROOT_PASSWORD_TMP}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export ROOT_PASSWORD=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
ROOT_PASSWORD_TMP=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
export ROOT_PASSWORD="${ROOT_PASSWORD_TMP}"
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 22-22: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/ai/pre-cleanup.bash` at line 22, Compute the password in a
separate step and check the pipeline exit status before exporting: capture the
output of the sha256sum/head pipeline into a local variable (e.g.,
tmp_pass=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)), verify the pipeline
succeeded (check $? or use set -o pipefail and test that tmp_pass is non-empty),
and only then export ROOT_PASSWORD="$tmp_pass"; this avoids masking failures of
sha256sum/head and ensures ROOT_PASSWORD is not set on error.

exit 1
fi

export ROOT_PASSWORD=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Declare and assign separately to avoid masking return values.

Same pattern as flagged in other files.

Proposed fix
-export ROOT_PASSWORD=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
+ROOT_PASSWORD_TMP=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
+export ROOT_PASSWORD="${ROOT_PASSWORD_TMP}"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export ROOT_PASSWORD=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
ROOT_PASSWORD_TMP=$(echo -n "${INFRA_ID}" | sha256sum | head -c 10)
export ROOT_PASSWORD="${ROOT_PASSWORD_TMP}"
🧰 Tools
🪛 Shellcheck (0.11.0)

[warning] 48-48: Declare and assign separately to avoid masking return values.

(SC2155)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/ai/setup-infras.bash` at line 48, The current line exports
ROOT_PASSWORD by running a command substitution inline which can mask exit
codes; instead, split declaration and assignment: first assign the output of the
sha256sum pipeline to the variable ROOT_PASSWORD using command substitution
(referencing INFRA_ID and the sha256sum | head -c 10 pipeline), then export
ROOT_PASSWORD on a separate line so the pipeline's return value isn't masked by
the export builtin.

@@ -0,0 +1 @@
# Tests that need review/fix
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Find where infras.lst files are parsed
rg -n -C3 'infras\.lst|while\s+read|read\s+-r|INFRAS|DEFAULT_.*INFRA' test/infra/control test/tap/groups

# 2) Inspect all current infras.lst contents to confirm expected format
fd -t f 'infras\.lst$' test/tap/groups --exec sed -n '1,80p' {}

Repository: sysown/proxysql

Length of output: 25651


Remove the comment line from infras.lst.

The parser does not filter # comment lines. The comment # Tests that need review/fix will be treated as a literal infrastructure identifier, causing provisioning to fail when the script attempts to locate directory test/infra/# Tests that need review/fix. Either remove the comment or populate the file with actual infrastructure names.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/groups/todo/infras.lst` at line 1, Remove the leading-comment line
"# Tests that need review/fix" from the file infras.lst so the parser doesn't
treat it as an infrastructure identifier; either delete that line or replace it
with valid infrastructure names (one per line) and ensure no lines start with
'#' so the provisioning script can locate real directories instead of
"test/infra/# Tests that need review/fix".

- Fix false positive "missing tests" errors when processing secondary
  workdirs (e.g., deprecate_eof_support). The validation now only
  checks tests that actually exist in the current workdir rather than
  all tests in groups.json for the TAP_GROUP.

- Add --ignore-errors negative,source --synthesize-missing flags to
  genhtml calls in run-tests-isolated.bash to match the multi-group
  runner and prevent HTML generation failures on coverage data with
  negative branch counts.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/scripts/bin/proxysql-tester.py`:
- Around line 925-950: The variables TAP_GROUP, group_has_tests, and groups must
be initialized before the try block so the post-try validation cannot raise
UnboundLocalError when an earlier exception occurs; add initial definitions
(e.g., TAP_GROUP = None, group_has_tests = False, groups = {}) immediately
before the try that sets them so the later validation code referencing
TAP_GROUP, group_has_tests, and groups will always see defined values.
- Around line 929-949: Adjust the TAP_GROUP validation to differentiate
deliberate skips from true no-shows by using the runnable set when computing
missing tests: ensure skipped_tests is only set in the explicit skip branches
above (do not infer skips from summary Nones here), compute passed_tests and
failed_tests from summary as you already do, compute a runnable_tests set (the
set of tests that could have run in this workdir) and then build missing_tests
as expected_in_workdir ∩ runnable_tests minus passed_tests minus failed_tests;
keep the existing rc adjustments but base them on this corrected missing_tests
so version-gated/skipped tests do not count as missing.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bb209fcc-9980-4177-8fe0-d0f775bdffba

📥 Commits

Reviewing files that changed from the base of the PR and between 179644f and 0bc75cd.

📒 Files selected for processing (2)
  • test/infra/control/run-tests-isolated.bash
  • test/scripts/bin/proxysql-tester.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/infra/control/run-tests-isolated.bash

- Use REGULAR_INFRA_DATADIR environment variable for config file path
  (falls back to /tmp for local mode)
- Add initial diag messages describing test purpose
- Add step-by-step debugging messages during test execution
- Update file header comment with test description

This fixes the issue where ProxySQL running in a Docker container
couldn't access config files created in /tmp on the test runner host.
- Add dynamic hostgroup discovery from mysql_servers table
- Replace hardcoded hostgroups (0, 1) with discovered ones (e.g., 1300, 1301)
- Add comprehensive debugging output for troubleshooting:
  - Current server and user configuration
  - Discovered hostgroups
  - Query execution details (before/after counts)
  - Routing failures with specific queries

The test was failing because it used hardcoded hostgroups 0 and 1,
but the Docker CI infrastructure uses dynamic hostgroups (1300, 1301, etc.)
depending on the infrastructure type.
Remove hardcoded TAP_MYSQLPORT=14806 from test .env files to use
the infrastructure-provided port (3306). The hardcoded port was
causing connection failures when tests run inside Docker containers.
- Query runtime_mysql_users instead of mysql_users for default_hostgroup
- Use mysql_replication_hostgroups to find writer/reader hostgroup pairs
- Fallback to runtime_mysql_servers if no replication hostgroups configured
- Remove hardcoded TAP_MYSQLPASSWORD from test_sqlite3_pass_exts-t.env

These changes allow tests to work correctly with Docker-based CI
infrastructure where hostgroups may differ from hardcoded defaults.
Two additional issues fixed:

1. Query hint hostgroup: ProxySQL query hints like /* ;hostgroup=0;... */
   were hardcoded to hostgroup 0. Now dynamically replaced with the
   discovered writer hostgroup.

2. Also removed hardcoded TAP_MYSQLPASSWORD=root from test_sqlite3_pass_exts-t.env
   to allow infrastructure-provided password to be used.
Move plan() call after MySQL version detection to set correct test count:
- MySQL 8.0+: 2213 tests (includes BY RANDOM PASSWORD tests)
- MySQL 5.7: 2012 tests (skips BY RANDOM PASSWORD tests)

Previously the test always planned 2213 tests, causing TAP failures
when running against MySQL 5.7 because only 2113 tests executed.
This test requires MySQL 8.0+ for utf8mb4_0900_as_ci collation support.
Removed from legacy-g4 and all g4 variants since they don't have MySQL 8.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/tap/tests/test_query_rules_routing-t.cpp`:
- Around line 372-377: The test currently discovers writer_hg by querying
runtime_mysql_users for the literal 'root', which mismatches the test traffic
user; change the SELECT against runtime_mysql_users to use the actual connected
frontend username (the test's client user variable, e.g. cl.username or whatever
variable represents the connected user in this test) instead of 'root' so
writer_hg reflects the user under test; update the MYSQL_QUERY call that builds
the query string (and any string formatting there) to substitute the connected
username, and keep the existing null/row checks (hg_res, hg_row) and atoi
conversion for writer_hg.
- Around line 397-415: The fallback that queries runtime_mysql_servers currently
overwrites an existing writer_hg and sets reader_hg equal to writer_hg when only
one ONLINE hostgroup exists, which breaks writer/reader validation; modify the
logic around the MYSQL_QUERY + online_hostgroups handling so it does not change
writer_hg if writer_hg is already set, searches for a second distinct hostgroup
for reader_hg (skip entries equal to writer_hg), and if no second distinct
ONLINE hostgroup is found then fail or skip the test instead of assigning
reader_hg = writer_hg; update the branches that use online_hostgroups.size() to
enforce "second distinct hostgroup required" semantics and leave writer_hg
intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d70a87b3-de08-429f-9ad1-2ca0b370c767

📥 Commits

Reviewing files that changed from the base of the PR and between 0bc75cd and 4dcabbc.

📒 Files selected for processing (6)
  • test/tap/groups/groups.json
  • test/tap/tests/reg_test_4935-caching_sha2-t.env
  • test/tap/tests/test_load_from_config_validation-t.cpp
  • test/tap/tests/test_query_rules_routing-t.cpp
  • test/tap/tests/test_sqlite3_pass_exts-t.cpp
  • test/tap/tests/test_sqlite3_pass_exts-t.env
💤 Files with no reviewable changes (2)
  • test/tap/tests/reg_test_4935-caching_sha2-t.env
  • test/tap/tests/test_sqlite3_pass_exts-t.env
✅ Files skipped from review due to trivial changes (1)
  • test/tap/tests/test_load_from_config_validation-t.cpp
🚧 Files skipped from review as they are similar to previous changes (1)
  • test/tap/groups/groups.json

Comment on lines +372 to +377
MYSQL_QUERY(proxysql_admin, "SELECT default_hostgroup FROM runtime_mysql_users WHERE username='root' LIMIT 1");
MYSQL_RES* hg_res = mysql_store_result(proxysql_admin);
MYSQL_ROW hg_row = mysql_fetch_row(hg_res);
if (hg_row && hg_row[0]) {
writer_hg = atoi(hg_row[0]);
diag("Discovered writer hostgroup from runtime_mysql_users 'root': %d", writer_hg);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the connected frontend user for hostgroup discovery.

Line 372 derives writer_hg from runtime_mysql_users for root, but the traffic under test is sent as cl.username. The INSERT/UPDATE cases still depend on the connected user's default_hostgroup, so this can rewrite the expected writer hostgroup incorrectly whenever the harness runs with sbtest1 or any non-root account.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/test_query_rules_routing-t.cpp` around lines 372 - 377, The
test currently discovers writer_hg by querying runtime_mysql_users for the
literal 'root', which mismatches the test traffic user; change the SELECT
against runtime_mysql_users to use the actual connected frontend username (the
test's client user variable, e.g. cl.username or whatever variable represents
the connected user in this test) instead of 'root' so writer_hg reflects the
user under test; update the MYSQL_QUERY call that builds the query string (and
any string formatting there) to substitute the connected username, and keep the
existing null/row checks (hg_res, hg_row) and atoi conversion for writer_hg.

Comment on lines +397 to +415
// Fallback: query runtime_mysql_servers to find hostgroups with servers in ONLINE status
MYSQL_QUERY(proxysql_admin, "SELECT DISTINCT hostgroup_id FROM runtime_mysql_servers WHERE status='ONLINE' ORDER BY hostgroup_id LIMIT 2");
hg_res = mysql_store_result(proxysql_admin);
std::vector<int> online_hostgroups;
while ((hg_row = mysql_fetch_row(hg_res))) {
online_hostgroups.push_back(atoi(hg_row[0]));
}
mysql_free_result(hg_res);

if (online_hostgroups.size() >= 2) {
// We have at least 2 hostgroups - use the first two for writer and reader
writer_hg = online_hostgroups[0];
reader_hg = online_hostgroups[1];
diag("Discovered hostgroups from runtime_mysql_servers: writer=%d, reader=%d", writer_hg, reader_hg);
} else if (online_hostgroups.size() == 1) {
writer_hg = online_hostgroups[0];
reader_hg = writer_hg; // Use same hostgroup if only one exists
diag("Only one hostgroup found: writer=%d, reader=%d (same)", writer_hg, reader_hg);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't let the fallback turn this into a one-hostgroup test.

Lines 398-415 can overwrite an already discovered writer with the lowest ONLINE hostgroup, and the size() == 1 branch makes reader_hg == writer_hg. At that point the test either targets the wrong hostgroup or passes without validating writer-vs-reader routing at all. Please preserve the discovered writer and skip/fail when there is no second distinct ONLINE hostgroup.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/test_query_rules_routing-t.cpp` around lines 397 - 415, The
fallback that queries runtime_mysql_servers currently overwrites an existing
writer_hg and sets reader_hg equal to writer_hg when only one ONLINE hostgroup
exists, which breaks writer/reader validation; modify the logic around the
MYSQL_QUERY + online_hostgroups handling so it does not change writer_hg if
writer_hg is already set, searches for a second distinct hostgroup for reader_hg
(skip entries equal to writer_hg), and if no second distinct ONLINE hostgroup is
found then fail or skip the test instead of assigning reader_hg = writer_hg;
update the branches that use online_hostgroups.size() to enforce "second
distinct hostgroup required" semantics and leave writer_hg intact.

test_tsdb_variables-t:
- Add verbose logging for downsampling test
- Insert test data with timestamps from 2 hours ago before downsampling
- The TSDB downsample logic only processes completed hours, so we need
  historical data for the test to succeed
- Add cleanup of test data after the test

test_tsdb_api-t:
- Use configurable host from environment (cl.host) instead of hardcoded 127.0.0.1
- Add verbose logging for configuration and test progress
- Use proper authentication credentials from command line
- Increase retries for dashboard endpoint
- Add detailed error messages

Fixes #5483
- Use cl.admin_host instead of cl.host for admin HTTP endpoints
- Don't log auth credentials (security issue)
- Fail fast when prerequisite SET/LOAD commands fail
- Assert on specific test_downsample_metric rows in hourly table
Fix TSDB TAP tests for Docker-based CI
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/tap/tests/test_tsdb_variables-t.cpp (1)

222-229: Consider adding error handling for consistency.

The loop inserts additional test data points but doesn't check the return value of mysql_query (unlike the first insert at line 215). While this is acceptable for short-lived test processes, adding error logging would help diagnose failures if insertions fail silently.

🔧 Optional: Add error checking for consistency
 	for (int i = 1; i <= 3; i++) {
 		snprintf(insert_query, sizeof(insert_query),
 			"INSERT OR REPLACE INTO stats_history.tsdb_metrics (timestamp, metric_name, labels, value) "
 			"VALUES (%ld, 'test_downsample_metric', '{\"test\":\"true\"}', %f)",
 			two_hours_ago + i * 60, 42.0 + i); // Add data points every minute
-		mysql_query(admin, insert_query);
+		rc = mysql_query(admin, insert_query);
+		if (rc != 0) {
+			diag("Failed to insert test data point %d: %s", i, mysql_error(admin));
+		}
 		drain_results(admin);
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/test_tsdb_variables-t.cpp` around lines 222 - 229, The loop
that calls mysql_query(admin, insert_query) for the test inserts lacks error
checking; update the loop in test_tsdb_variables-t.cpp to check the return value
of mysql_query for each iteration (same pattern used for the earlier insert),
and on failure log the error (e.g., using fprintf or process logger) including
mysql_error(admin) and decide whether to continue or abort the test; ensure you
still call drain_results(admin) after a successful query, and reference the
variables/functions insert_query, admin, mysql_query, mysql_error, and
drain_results when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/tap/tests/test_tsdb_api-t.cpp`:
- Around line 186-190: The HTTP request loop uses http_get((base_url +
"/tsdb").c_str(), response) without passing credentials, causing it to fall back
to the hardcoded default; modify the loop to call the credential-aware overload
used elsewhere by supplying the same userpwd/auth parameter (the same argument
used in other tests where http_get is invoked), e.g. use http_get((base_url +
"/tsdb").c_str(), response, userpwd) so the request honors the env/CLI
credentials (update the call in the loop that uses base_url, response, and
http_rc).
- Around line 155-168: The runtime-variable probe currently only logs results;
change it to enforce preconditions by checking mysql_query(admin, ...) return
code and failing the test immediately on error, verifying
mysql_store_result(admin) is non-NULL, counting returned rows and ensuring each
expected variable name ('admin-restapi_enabled', 'admin-web_enabled',
'tsdb-enabled') is present and its value matches the expected state; on any
missing row, unexpected value, or SQL error call the test failure/bail function
(use the same test harness failure helper as other tests) after cleaning up
(mysql_free_result) rather than only diag()-ing the results, and keep the
existing diag output for successful verification.

---

Nitpick comments:
In `@test/tap/tests/test_tsdb_variables-t.cpp`:
- Around line 222-229: The loop that calls mysql_query(admin, insert_query) for
the test inserts lacks error checking; update the loop in
test_tsdb_variables-t.cpp to check the return value of mysql_query for each
iteration (same pattern used for the earlier insert), and on failure log the
error (e.g., using fprintf or process logger) including mysql_error(admin) and
decide whether to continue or abort the test; ensure you still call
drain_results(admin) after a successful query, and reference the
variables/functions insert_query, admin, mysql_query, mysql_error, and
drain_results when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a9460756-8b21-4083-89ca-91b15df94bf5

📥 Commits

Reviewing files that changed from the base of the PR and between 4dcabbc and 5129d55.

📒 Files selected for processing (2)
  • test/tap/tests/test_tsdb_api-t.cpp
  • test/tap/tests/test_tsdb_variables-t.cpp
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: run / trigger
  • GitHub Check: claude-review
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
📚 Learning: 2026-01-20T09:34:19.124Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:19.124Z
Learning: In ProxySQL's TAP test suite, resource leaks (e.g., not calling mysql_close() on early return paths) are commonly tolerated because test processes are short-lived and OS frees resources on exit. This pattern applies to all C++ test files under test/tap/tests. When reviewing, recognize this as a project-wide test convention and focus on test correctness and isolation rather than insisting on fixing such leaks in these test files.

Applied to files:

  • test/tap/tests/test_tsdb_variables-t.cpp
  • test/tap/tests/test_tsdb_api-t.cpp
📚 Learning: 2026-02-13T09:29:39.713Z
Learnt from: wazir-ahmed
Repo: sysown/proxysql PR: 5372
File: test/tap/tap/mcp_client.cpp:355-385
Timestamp: 2026-02-13T09:29:39.713Z
Learning: In ProxySQL MCP implementation (test/tap/tap/mcp_client.cpp), the `check_server()` method uses the ping endpoint which is designed to work without authentication. The `ping` method at the `config` endpoint should not require the `Authorization: Bearer` header, unlike tool invocation endpoints which do require authentication when `auth_token_` is set.

Applied to files:

  • test/tap/tests/test_tsdb_api-t.cpp
📚 Learning: 2026-01-20T07:40:34.938Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).

Applied to files:

  • test/tap/tests/test_tsdb_api-t.cpp
🔇 Additional comments (4)
test/tap/tests/test_tsdb_variables-t.cpp (3)

4-4: LGTM!

Correct header inclusion for time_t and time(NULL) used in the timestamp calculations below.


232-259: LGTM!

The verification logic is thorough:

  • Logs pre-downsample state for debugging
  • Validates both total hourly rows and specific test metric rows
  • The assertion correctly confirms that downsampling produced data for the inserted test metric

This aligns well with the downsample implementation which processes completed hours (data from at least 1 hour before current hour boundary).


261-266: LGTM!

Good test hygiene—cleaning up the inserted test data prevents pollution of subsequent test runs. Based on learnings, the lack of error checking on cleanup queries is acceptable for short-lived test processes.

test/tap/tests/test_tsdb_api-t.cpp (1)

112-153: Nice fail-fast handling for the setup SQL.

Checking each SET/LOAD result before moving on should make CI failures much easier to root-cause.

Comment on lines +155 to +168
// Verify the settings were applied
MYSQL_RES* res;
rc = mysql_query(admin, "SELECT variable_name, variable_value FROM runtime_global_variables WHERE variable_name IN ('admin-restapi_enabled', 'admin-web_enabled', 'tsdb-enabled')");
if (rc == 0) {
res = mysql_store_result(admin);
if (res) {
diag("Runtime variable status:");
MYSQL_ROW row;
while ((row = mysql_fetch_row(res))) {
diag(" %s = %s", row[0], row[1]);
}
mysql_free_result(res);
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make the runtime-variable probe enforce the preconditions.

This block is diagnostics-only right now. If one of the SET/LOAD ... TO RUNTIME steps stops taking effect but the endpoints happen to be enabled already, the test can still go green and miss the regression. Please fail fast on SELECT errors, missing rows, or unexpected values here instead of only logging them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/tap/tests/test_tsdb_api-t.cpp` around lines 155 - 168, The
runtime-variable probe currently only logs results; change it to enforce
preconditions by checking mysql_query(admin, ...) return code and failing the
test immediately on error, verifying mysql_store_result(admin) is non-NULL,
counting returned rows and ensuring each expected variable name
('admin-restapi_enabled', 'admin-web_enabled', 'tsdb-enabled') is present and
its value matches the expected state; on any missing row, unexpected value, or
SQL error call the test failure/bail function (use the same test harness failure
helper as other tests) after cleaning up (mysql_free_result) rather than only
diag()-ing the results, and keep the existing diag output for successful
verification.

- Fix code coverage race condition: Each ProxySQL instance now writes
  .gcda files to its own directory using GCOV_PREFIX, preventing
  parallel groups from overwriting each other's coverage data
- Add TAP_USE_NOISE documentation for noise injection testing
- Replace random stagger delay with sequential delay (STAGGER_DELAY)
  to guarantee proper spacing between group startups
- Update README with multi-group runner, coverage, and noise injection docs
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
test/infra/control/run-tests-isolated.bash (1)

131-154: ⚠️ Potential issue | 🟠 Major

Missing /gcov volume mount breaks in-container coverage collection.

The test-runner container's coverage collection trap (line 166) expects to read .gcda files from /gcov, but this directory is not mounted. The ProxySQL container mounts COVERAGE_DATA_DIR at /gcov (see start-proxysql-isolated.bash:40), but the test-runner container needs the same mount to access the coverage data.

🐛 Proposed fix - add /gcov mount
 # Execution: run the container
+COVERAGE_DATA_DIR="${WORKSPACE}/ci_infra_logs/${INFRA_ID}/gcov"
 docker run \
     --name "${TEST_CONTAINER}" \
     --network "${NETWORK_NAME}" \
     --cap-add=NET_ADMIN \
     --cap-add=SYS_ADMIN \
     -v "${WORKSPACE}:${WORKSPACE}" \
     -v "${PROXY_DATA_DIR_HOST}:/var/lib/proxysql" \
+    -v "${COVERAGE_DATA_DIR}:/gcov" \
     -e WORKSPACE="${WORKSPACE}" \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-tests-isolated.bash` around lines 131 - 154, The
docker run invocation that starts TEST_CONTAINER is missing the coverage data
volume mount so the in-container coverage trap cannot read .gcda files; update
the docker run block that builds TEST_CONTAINER (where environment vars like
COVERAGE_MODE and COVERAGE_REPORT_DIR are set) to mount the host
COVERAGE_DATA_DIR into the container at /gcov (same mount used by the ProxySQL
container) by adding the corresponding -v "${COVERAGE_DATA_DIR}:/gcov" volume
entry so the test-runner can access coverage data.
♻️ Duplicate comments (1)
test/infra/control/run-tests-isolated.bash (1)

279-315: ⚠️ Potential issue | 🟡 Minor

Host-side coverage collection is unreachable—container has already exited.

The docker run command at line 131 blocks until the container exits. By the time execution reaches line 285, the test-runner container has stopped, so docker exec will fail. Additionally, the in-container EXIT trap (lines 159-203) already collects coverage before the container stops.

This entire block should be removed since coverage is already handled by the in-container trap.

🔧 Proposed fix - remove dead code
 fi

-# Capture code coverage data if enabled
-# This runs regardless of test success/failure
-if [ "${COVERAGE_MODE}" = "1" ]; then
-    echo ">>> Collecting code coverage data..."
-
-    # Check if fastcov is available in the container
-    if docker exec "${TEST_CONTAINER}" which fastcov >/dev/null 2>&1; then
-        COVERAGE_INFO_FILE="${COVERAGE_REPORT_DIR}/${TAP_GROUP:-test}-${INFRA_ID}.info"
-
-        echo ">>> Generating coverage report: ${COVERAGE_INFO_FILE}"
-        # Extract coverage data from container
-        docker exec "${TEST_CONTAINER}" bash -c "
-            cd '${WORKSPACE}' && \
-            fastcov -b -j\$(nproc) --process-gcno -l \
-                -e /usr/include/ -e test/tap/tests \
-                -d . -i include -i lib -i src \
-                -o '${COVERAGE_INFO_FILE}' 2>&1
-        " || echo ">>> WARNING: Coverage generation failed"
-
-        # Check if report was generated
-        if [ -f "${COVERAGE_INFO_FILE}" ]; then
-            echo ">>> Coverage report generated: ${COVERAGE_INFO_FILE}"
-            # Generate HTML report
-            if command -v genhtml >/dev/null 2>&1; then
-                HTML_REPORT_DIR="${COVERAGE_REPORT_DIR}/html/${TAP_GROUP:-test}-${INFRA_ID}"
-                mkdir -p "${HTML_REPORT_DIR}"
-                genhtml --branch-coverage --ignore-errors negative,source --synthesize-missing "${COVERAGE_INFO_FILE}" --output-directory "${HTML_REPORT_DIR}" 2>/dev/null || \
-                    echo ">>> WARNING: HTML report generation failed"
-                [ -d "${HTML_REPORT_DIR}" ] && echo ">>> HTML report: ${HTML_REPORT_DIR}/index.html"
-            fi
-        else
-            echo ">>> WARNING: Coverage info file not generated"
-        fi
-    else
-        echo ">>> WARNING: fastcov not found in container, skipping coverage collection"
-    fi
-fi
-
 # Clean up only the runner container
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-tests-isolated.bash` around lines 279 - 315, Remove
the host-side coverage collection block that checks COVERAGE_MODE and runs
docker exec/fastcov (the entire if [ "${COVERAGE_MODE}" = "1" ] ... fi section
that generates COVERAGE_INFO_FILE and calls docker exec/genhtml), because
coverage is already collected by the in-container EXIT trap; delete that whole
block (including the docker exec fastcov call and HTML generation logic) and
ensure no remaining references rely on COVERAGE_INFO_FILE or docker exec for
post-run coverage collection.
🧹 Nitpick comments (3)
test/infra/control/run-multi-group.bash (3)

82-83: PID_FOR_GROUP is assigned but never read.

This associative array at line 82 is populated at line 228 but never used. Can be removed along with its declaration.

♻️ Proposed cleanup
 declare -A GROUP_FOR_PID
-declare -A PID_FOR_GROUP

And at line 228:

     GROUP_FOR_PID["${local_pid}"]="${group}"
-    PID_FOR_GROUP["${group}"]="${local_pid}"

Also applies to: 228-228

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-multi-group.bash` around lines 82 - 83, Remove the
unused associative array PID_FOR_GROUP: delete its declaration (declare -A
PID_FOR_GROUP) and the code that populates it (the assignment that sets
PID_FOR_GROUP[...] = ...), and also remove any other references to PID_FOR_GROUP
elsewhere in the script; keep EXIT_CODES intact if it's used elsewhere. Ensure
no remaining references to the PID_FOR_GROUP symbol remain so the script still
runs.

84-85: Unused associative arrays can be removed.

START_TIMES and END_TIMES are declared but never populated or read. They appear to be leftover scaffolding.

♻️ Proposed cleanup
 declare -A EXIT_CODES
-declare -A START_TIMES
-declare -A END_TIMES
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-multi-group.bash` around lines 84 - 85, REMOVE the
unused associative arrays START_TIMES and END_TIMES: they are declared (declare
-A START_TIMES / declare -A END_TIMES) but never populated or read, so delete
those two declarations from the script (or replace with a comment if they were
intended as future scaffolding) to clean up dead code.

331-343: Space-separated COVERAGE_FILES may break with paths containing spaces.

The loop builds COVERAGE_FILES as a space-separated string, which is then passed unquoted in the container bash script (line 358). If WORKSPACE or any path component contains spaces, this will break.

Given that CI paths typically don't contain spaces, this is a minor concern. Consider using an array if robustness is needed.

♻️ Alternative using newline separation
-    COVERAGE_FILES=""
+    COVERAGE_FILES_LIST=""
     for group in ${TAP_GROUPS}; do
         infra_id="${group}-${RUN_ID}"
         group_coverage_dir="${WORKSPACE}/ci_infra_logs/${infra_id}/coverage-report"
         if [ -d "${group_coverage_dir}" ]; then
             for info_file in "${group_coverage_dir}"/*.info; do
                 if [ -f "${info_file}" ]; then
-                    COVERAGE_FILES="${COVERAGE_FILES} ${info_file}"
+                    COVERAGE_FILES_LIST="${COVERAGE_FILES_LIST}${info_file}"$'\n'
                     echo ">>> Found coverage: ${info_file}"
                 fi
             done
         fi
     done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/infra/control/run-multi-group.bash` around lines 331 - 343, The
COVERAGE_FILES accumulation builds a space-separated string which breaks on
paths with spaces; change COVERAGE_FILES from a string to a bash array (e.g.,
COVERAGE_FILES=() and append with COVERAGE_FILES+=("$info_file")) when iterating
TAP_GROUPS and group_coverage_dir/info_file, and ensure later usage (where
COVERAGE_FILES is expanded) uses the quoted array expansion
("${COVERAGE_FILES[@]}") so each path is preserved as a single item; update any
subsequent references that pass COVERAGE_FILES into the container script to use
the array expansion or join safely if a single string is required.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@test/infra/control/run-tests-isolated.bash`:
- Around line 131-154: The docker run invocation that starts TEST_CONTAINER is
missing the coverage data volume mount so the in-container coverage trap cannot
read .gcda files; update the docker run block that builds TEST_CONTAINER (where
environment vars like COVERAGE_MODE and COVERAGE_REPORT_DIR are set) to mount
the host COVERAGE_DATA_DIR into the container at /gcov (same mount used by the
ProxySQL container) by adding the corresponding -v "${COVERAGE_DATA_DIR}:/gcov"
volume entry so the test-runner can access coverage data.

---

Duplicate comments:
In `@test/infra/control/run-tests-isolated.bash`:
- Around line 279-315: Remove the host-side coverage collection block that
checks COVERAGE_MODE and runs docker exec/fastcov (the entire if [
"${COVERAGE_MODE}" = "1" ] ... fi section that generates COVERAGE_INFO_FILE and
calls docker exec/genhtml), because coverage is already collected by the
in-container EXIT trap; delete that whole block (including the docker exec
fastcov call and HTML generation logic) and ensure no remaining references rely
on COVERAGE_INFO_FILE or docker exec for post-run coverage collection.

---

Nitpick comments:
In `@test/infra/control/run-multi-group.bash`:
- Around line 82-83: Remove the unused associative array PID_FOR_GROUP: delete
its declaration (declare -A PID_FOR_GROUP) and the code that populates it (the
assignment that sets PID_FOR_GROUP[...] = ...), and also remove any other
references to PID_FOR_GROUP elsewhere in the script; keep EXIT_CODES intact if
it's used elsewhere. Ensure no remaining references to the PID_FOR_GROUP symbol
remain so the script still runs.
- Around line 84-85: REMOVE the unused associative arrays START_TIMES and
END_TIMES: they are declared (declare -A START_TIMES / declare -A END_TIMES) but
never populated or read, so delete those two declarations from the script (or
replace with a comment if they were intended as future scaffolding) to clean up
dead code.
- Around line 331-343: The COVERAGE_FILES accumulation builds a space-separated
string which breaks on paths with spaces; change COVERAGE_FILES from a string to
a bash array (e.g., COVERAGE_FILES=() and append with
COVERAGE_FILES+=("$info_file")) when iterating TAP_GROUPS and
group_coverage_dir/info_file, and ensure later usage (where COVERAGE_FILES is
expanded) uses the quoted array expansion ("${COVERAGE_FILES[@]}") so each path
is preserved as a single item; update any subsequent references that pass
COVERAGE_FILES into the container script to use the array expansion or join
safely if a single string is required.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a3b40ce7-0898-46b3-a5af-78ed88a86f0f

📥 Commits

Reviewing files that changed from the base of the PR and between 5129d55 and 86e86c8.

📒 Files selected for processing (5)
  • test/infra/README.md
  • test/infra/control/env-isolated.bash
  • test/infra/control/run-multi-group.bash
  • test/infra/control/run-tests-isolated.bash
  • test/infra/control/start-proxysql-isolated.bash
✅ Files skipped from review due to trivial changes (1)
  • test/infra/README.md
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: CI-builds / builds (ubuntu22,-tap)
  • GitHub Check: CI-builds / builds (debian12,-dbg)
  • GitHub Check: run / trigger
  • GitHub Check: claude-review
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.
📚 Learning: 2026-01-20T09:34:27.165Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:39-48
Timestamp: 2026-01-20T09:34:27.165Z
Learning: In ProxySQL test files (test/tap/tests/), resource leaks (such as not calling `mysql_close()` on early return paths) are not typically fixed because test processes are short-lived and the OS frees resources on process exit. This is a common pattern across the test suite.

Applied to files:

  • test/infra/control/run-tests-isolated.bash
📚 Learning: 2026-01-20T07:40:34.938Z
Learnt from: yuji-hatakeyama
Repo: sysown/proxysql PR: 5307
File: test/tap/tests/reg_test_5306-show_warnings_with_comment-t.cpp:24-28
Timestamp: 2026-01-20T07:40:34.938Z
Learning: In ProxySQL test files, calling `mysql_error(NULL)` after `mysql_init()` failure is safe because the MariaDB client library implementation returns an empty string for NULL handles (not undefined behavior).

Applied to files:

  • test/infra/control/run-tests-isolated.bash
🪛 Shellcheck (0.11.0)
test/infra/control/run-multi-group.bash

[warning] 84-84: START_TIMES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 85-85: END_TIMES appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 228-228: PID_FOR_GROUP appears unused. Verify use (or export if used externally).

(SC2034)

🔇 Additional comments (12)
test/infra/control/start-proxysql-isolated.bash (2)

24-27: LGTM - Per-INFRA_ID coverage directory isolation.

This correctly creates a separate coverage data directory for each INFRA_ID, preventing parallel write conflicts when multiple test groups run concurrently.


40-40: Coverage mount and GCOV_PREFIX setup looks correct.

The /gcov mount with GCOV_PREFIX environment variables will redirect coverage data files to the mounted directory, enabling the test-runner container to access them later. This integrates well with the coverage collection logic in run-tests-isolated.bash.

test/infra/control/env-isolated.bash (2)

99-103: LGTM - Noise injection configuration.

The TAP_USE_NOISE variable is properly defaulted to disabled ("0") and includes helpful documentation pointing to NOISE_TESTING.md for details.


115-117: LGTM - Conditional noise injection logging.

The check correctly handles both "1" and "true" values for flexibility.

test/infra/control/run-tests-isolated.bash (4)

4-27: LGTM - Comprehensive documentation header.

The usage documentation clearly explains the optional environment variables for coverage and noise injection, along with helpful notes about requirements and output locations.


37-43: LGTM - Coverage report directory setup.

The path ${WORKSPACE}/ci_infra_logs/${INFRA_ID}/coverage-report aligns with what run-multi-group.bash expects when combining coverage reports.


158-204: Coverage trap logic is well-structured.

The EXIT trap correctly preserves the original exit code while collecting coverage. Good defensive handling with warnings for missing tools. However, this relies on the /gcov mount being present (see previous comment).


224-258: Useful diagnostic dump before test execution.

The configuration dump provides valuable debugging information. Good distinction between ERROR (MySQL) and INFO (PgSQL) fallback messages reflecting their relative importance.

test/infra/control/run-multi-group.bash (4)

1-26: LGTM - Well-documented orchestration script.

The header clearly explains usage, all configurable environment variables, and coverage notes. Good documentation for a new complex feature.


102-196: Well-implemented group execution with proper timeout handling.

The function correctly:

  • Staggers startup to reduce resource contention
  • Handles timeout exit code (124) distinctly
  • Preserves exit codes through the heredoc
  • Only auto-cleans successful runs

207-231: LGTM - Proper parallel job control.

The wait -n with PARALLEL_JOBS limit correctly manages concurrent execution. The EXIT_ON_FIRST_FAIL logic properly skips remaining groups when a failure occurs.


350-388: LGTM - Robust coverage combination with fallback.

Good approach to run coverage tools in container where they're guaranteed to be available. The fastcov → lcov fallback chain handles tool availability gracefully.

- Remove redundant host-side coverage collection in run-tests-isolated.bash
  (coverage is already collected via trap inside the container)
- Initialize TAP_GROUP, group_has_tests, groups before try block in
  proxysql-tester.py to avoid UnboundLocalError on early exceptions
- Differentiate skipped tests from missing tests in TAP_GROUP validation
  so intentionally skipped tests don't count as missing
Address PR #5470 review feedback:
- max_connections_ff-t: Use cl.admin_host for admin connection
- test_query_rules_routing-t: Use cl.admin_host for admin connection
- test_tsdb_api-t: Use stats:stats credentials for web dashboard,
  admin credentials for REST API (they use separate auth)
The test runner container needs access to the /gcov directory where
ProxySQL writes its coverage data (.gcda files). Without this mount,
the coverage collection script could only see .gcno files (compile-time)
but not the runtime coverage data from ProxySQL.

This fix adds the gcov volume mount to the test runner container,
allowing it to merge coverage data from each ProxySQL instance.
Two fixes required:
1. Mount gcov directory in test runner container so it can access
   the .gcda files written by ProxySQL
2. Call 'PROXYSQL GCOV DUMP' after each TAP test to flush coverage
   counters from memory to disk

Without these fixes, coverage data was never flushed from the running
ProxySQL process, resulting in 0% coverage for ProxySQL core files.
The heredoc in run-multi-group.bash was causing bash to exit prematurely
after ensure-infras.bash completed, without executing the test commands.
This was due to shell expansion issues with the heredoc structure.

Changes:
- Replace heredoc with temp file approach in run-multi-group.bash
- Add pre-proxysql hook call to ensure-infras.bash for cluster setup
- Create legacy/pre-proxysql.bash for cluster node startup
- Update mysql84/pre-proxysql.bash with cluster startup calls
- Fix coverage collection to work correctly in parallel execution
@renecannao renecannao force-pushed the private/multi-group-runner branch from f9993d4 to 60b5aff Compare March 22, 2026 13:07
@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarQube Cloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants