Skip to content

Milestone 3 Phase 3.3: Extract monitor health state decision logic for unit testing #5491

@renecannao

Description

@renecannao

Parent Issue

Part of #5479 — Unit Testing Milestone 3: Extract Testable Logic from Coupled Classes

Context

Phase 3.3 (#5491) covers MySQL_Monitor health decisions. The Monitor module decides when to shun/unshun servers based on health check results (ping, read-only status, replication lag, heartbeat failures). These decisions are currently embedded in check handler callbacks that are coupled to real MySQL connections and the HostGroups Manager.

The decision logic itself is algorithmic: given health check results and thresholds, determine the next server status. This can be extracted and tested with mock health data.

Current Code

File: `lib/MySQL_Monitor.cpp`

Key decision points:

  1. Server status determination (lines ~2855-2906): Combines `viable_candidate`, `read_only`, and check results into `MySerStatus` (ONLINE, OFFLINE_SOFT, SHUNNED)
  2. Heartbeat-based shunning (lines ~3182-3221): Consecutive ping failures → shun and kill all connections
  3. Unshunning logic (in `Base_HostGroups_Manager.cpp` ~2194-2245): Time-based recovery from SHUNNED state

Scope

Extract:

  1. `determine_server_status()` — Given health check results and thresholds:
    ```cpp
    enum MySerStatus determine_server_status(
    enum MySerStatus current_status,
    bool ping_successful,
    bool read_only_check_passed,
    bool is_read_only,
    int replication_lag_ms,
    int max_replication_lag_ms,
    unsigned int consecutive_failures,
    unsigned int max_failures_threshold
    );
    ```

  2. `should_shun_on_failures()` — Heartbeat failure shunning:
    ```cpp
    bool should_shun_on_failures(
    unsigned int consecutive_failures,
    unsigned int max_failures,
    unsigned int current_connections
    );
    ```

  3. `can_unshun_server()` — Time-based recovery:
    ```cpp
    bool can_unshun_server(
    time_t time_last_error,
    time_t current_time,
    int shun_recovery_time_sec,
    int unshun_algorithm
    );
    ```

Unit tests should cover:

  • Healthy server stays ONLINE
  • Ping failure → SHUNNED
  • Read-only detected → OFFLINE_SOFT (for writer hostgroup)
  • Replication lag exceeds threshold → excluded from routing
  • Consecutive failures exceed max → shun and kill all
  • Single failure below max → stays ONLINE
  • Shunned server recovers after recovery time elapsed
  • Shunned server stays shunned if recovery time not elapsed
  • Different `unshun_algorithm` values produce different behavior
  • Edge cases: zero thresholds, max thresholds

IMPORTANT: Implementation Instructions

Branch and PR rules

  • Create your branch from `v3.0-5473`, NOT from `v3.0`
  • Target the PR to `v3.0-5473`, NOT to `v3.0`
  • Branch name should be `v3.0-5491`

Where to place test files

  • Test file MUST go in `test/tap/tests/unit/`, NOT in `test/tap/tests/`
  • Name the test file `monitor_health_unit-t.cpp`

How to write the test file

The project has a unit test harness. Your test file MUST:

  1. Include the harness headers:
    ```cpp
    #include "tap.h"
    #include "test_globals.h"
    #include "test_init.h"
    ```

  2. Include proxysql headers for the types you need:
    ```cpp
    #include "proxysql.h"
    #include "cpp.h"
    ```

  3. Call `test_init_minimal()` at the start of `main()`

  4. Call `test_cleanup_minimal()` at the end

  5. Use TAP functions: `plan()`, `ok()`, `exit_status()`

DO NOT:

  • Do NOT reimplement the extracted functions in the test file — the test must call the REAL functions from `libproxysql.a`
  • Do NOT manually define `noise_failures`, `noise_failure_mutex`, `stop_noise_tools`, or `get_noise_tools_count` — these are already provided by `test_globals.cpp`
  • Do NOT place the test in `test/tap/tests/` — it MUST go in `test/tap/tests/unit/`

How to register the test in the Makefile

Edit `test/tap/tests/unit/Makefile`:

  1. Add the test name to the `UNIT_TESTS` variable
  2. Add a build rule following the existing pattern:
    ```makefile
    monitor_health_unit-t: monitor_health_unit-t.cpp $(TEST_HELPERS_OBJ) $(LIBPROXYSQLAR)
    $(CXX) $< $(TEST_HELPERS_OBJ) $(IDIRS) $(LDIRS) $(OPT) \
    $(LIBPROXYSQLAR_FULL) $(STATIC_LIBS) $(MYLIBS) \
    $(ALLOW_MULTI_DEF) -o $@
    ```

Where to place extracted function declarations

If you create new header files for the extracted function declarations, place them in `include/` with a proper include guard. Use the same pattern as `include/ConnectionPoolDecision.h` — a standalone header with no circular dependencies.

Reference examples

Look at these existing files for the correct pattern:

  • `test/tap/tests/unit/connection_pool_unit-t.cpp` — unit test using harness
  • `test/tap/tests/unit/rule_matching_unit-t.cpp` — unit test using harness
  • `include/ConnectionPoolDecision.h` — standalone extracted function header

Acceptance Criteria

  • Extracted functions are pure (no global state access, no I/O)
  • Existing E2E TAP tests pass unchanged (behavior-preserving refactoring)
  • Unit tests cover all decision branches
  • PR targets `v3.0-5473` (NOT `v3.0`)
  • Test file is in `test/tap/tests/unit/`
  • Test uses harness (`test_globals.h`, `test_init.h`)
  • Test does NOT reimplement extracted functions
  • Doxygen documentation on extracted functions

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions