-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Problem
The CI test infrastructure orchestration (test/infra/control/*.bash, test/tap/groups/*/pre-proxysql.bash, setup-infras.bash hooks) is built entirely in bash. This has proven unreliable and difficult to debug in practice:
Error suppression patterns
Scripts routinely suppress errors, making failures invisible:
# Actual code that was in run-tests-isolated.bash:
mysql -uradmin -pradmin -hproxysql -P6032 -e 'SELECT ...' 2>/dev/null || echo 'ERROR: Failed to query mysql_servers'This hides the actual error (connection refused? auth failure? timeout?), prints a generic message, and continues execution as if nothing happened. Tests then fail later with confusing symptoms (e.g., "table doesn't exist" because the entire ProxySQL config was silently broken).
Fragile error handling
set -eis a bandaid — it doesn't propagate errors through pipes, subshells, or command substitutions without additionalset -o pipefail- Functions can't return structured errors — only exit codes (0-255)
- No distinction between "expected failure" and "bug" — both are just non-zero exit codes
- Traps (
trap ... EXIT) are the only cleanup mechanism, and they don't compose well
No structured logging
echoto stdout mixes with command output- No log levels, timestamps are inconsistent
- Debugging requires adding
set -xand wading through hundreds of lines of shell expansion
Hardcoded paths and assumptions
- Scripts assume specific directory structures relative to
${BASH_SOURCE[0]} REPO_ROOTderivation via../../..counting is error-prone (we had bugs with 3 vs 4 levels)- Environment variable passing between scripts is fragile —
exportin one script doesn't reach adocker execor a subprocess reliably
Race conditions and timing
sleep 10as synchronization primitive- Health checks in while loops with arbitrary timeouts
- No proper readiness probes — just "can I connect?" checks
- Parallel execution (run-multi-group.bash) has no proper coordination
Real bugs found during CI migration (March 2026)
- Silent config dump failures: ProxySQL admin queries failed silently due to
2>/dev/null, tests continued with broken config, failed with misleading "table doesn't exist" errors - Hardcoded
/home/rene/proxysqlpaths: 12 scripts had absolute paths that only worked on one machine - Wrong directory traversal:
../../..instead of../../../..— worked locally by accident, failed on CI runners - WITHGCOV defaulting to 1: Caused
PROXYSQL GCOV DUMPsyntax errors on non-coverage builds, hidden by error suppression - Missing feature flag propagation: Unit test Makefile relied on environment variables that weren't set, causing link failures
- Hostgroup monitor conflicts: Setup hooks silently failed to LOAD changes to runtime, monitor overwrote configuration
Proposal
Migrate the orchestration layer to Python, keeping bash only for thin wrappers where shell is genuinely the right tool (e.g., Docker commands).
What to migrate
| Current (bash) | Purpose | Complexity |
|---|---|---|
ensure-infras.bash |
Start ProxySQL + backends for a TAP group | High — Docker orchestration, health checks, hook execution |
run-tests-isolated.bash |
Launch test runner container, env setup, coverage collection | High — Docker run, env propagation, coverage traps |
run-multi-group.bash |
Parallel group execution with timeouts | High — Process management, log aggregation, coverage combining |
start-proxysql-isolated.bash |
Start ProxySQL in Docker | Medium — Docker run, health check |
stop-proxysql-isolated.bash |
Cleanup ProxySQL | Low |
destroy-infras.bash |
Cleanup backends | Low |
docker-compose-init.bash |
Start Docker Compose infra | Medium — Volume prep, health checks, user provisioning |
docker-*-post.bash |
Post-startup provisioning | Medium — SQL provisioning, replication setup |
env-isolated.bash |
Environment variable setup | Low — but critical for correctness |
*/pre-proxysql.bash |
Group-specific pre-test hooks | Medium — cluster setup, config manipulation |
*/setup-infras.bash |
Group-specific post-infra hooks | Medium — hostgroup remapping, fixture setup |
Benefits of Python
- Proper exceptions: errors propagate automatically, no silent failures
- Structured logging:
loggingmodule with levels, timestamps, formatters - Testable: orchestration logic can have unit tests
- Type hints: catch configuration errors before runtime
- subprocess management:
subprocess.run(check=True)fails loudly by default - Docker SDK:
dockerPython package instead of parsing CLI output - Parallel execution:
concurrent.futuresinstead of bash background jobs
Migration approach
- Start with a Python
CIOrchestratorclass that wraps the current bash scripts - Migrate one script at a time, keeping bash as fallback
proxysql-tester.pyalready exists as a Python test runner — extend the pattern- Keep bash only for: Docker Compose files, simple one-liner wrappers
What NOT to change
- Docker Compose YAML files — these are fine as-is
proxysql-tester.py— already Python, works well- Makefile build system — separate concern
- GitHub Actions YAML — separate concern
Priority
Medium-term. The immediate bash fixes (removing 2>/dev/null, fixing paths) are done. This issue tracks the longer-term migration to make the infrastructure actually reliable.