Skip to content

test: Flushdb failed lead to jepsen test fail. #91

@yi-xmu

Description

@yi-xmu

Jepsen test on EloqKv with rocksdbcloud (single node mode):

2025-09-19 20:16:19,767 - run_and_check - INFO - Iteration 10 started.
2025-09-19 20:16:19,768 - run_and_check - INFO - Recovering cluster before iteration 10...
2025-09-19 20:16:19,768 - run_and_check - INFO - Checking status of store-2:6389 (leader)
2025-09-19 20:16:19,768 - run_and_check - INFO - run command ssh store-2 'ps -ef | grep "eloqkv.*-6389.ini" | grep -v grep'
2025-09-19 20:16:20,525 - run_and_check - INFO - Host store-2:6389 (leader) is running with PID 3659861
2025-09-19 20:16:20,525 - run_and_check - INFO - Checking status of store-1:9000 (kv)
2025-09-19 20:16:20,525 - run_and_check - INFO - run command ssh store-1 'ps -ef | grep "minio server" | grep -v grep'
2025-09-19 20:16:21,330 - run_and_check - INFO - Host store-1:9000 (kv) is running with PID 1483564
2025-09-19 20:16:21,331 - run_and_check - INFO - Handling running host: store-2:6389, PID: 3659861
2025-09-19 20:16:21,331 - run_and_check - INFO - run command ssh store-2 'sudo iptables -F'
2025-09-19 20:16:22,092 - run_and_check - INFO - Iptables reset on store-2
2025-09-19 20:16:22,093 - run_and_check - INFO - run command ssh store-2 'kill -CONT 3659861'
2025-09-19 20:16:22,767 - run_and_check - INFO - Process 3659861 on store-2:6389 resumed
2025-09-19 20:16:22,767 - run_and_check - INFO - Handling running host: store-1:9000, PID: 1483564
2025-09-19 20:16:22,767 - run_and_check - INFO - run command ssh store-1 'sudo iptables -F'
2025-09-19 20:16:23,538 - run_and_check - INFO - Iptables reset on store-1
2025-09-19 20:16:23,538 - run_and_check - INFO - run command ssh store-1 'kill -CONT 1483564'
2025-09-19 20:16:24,285 - run_and_check - WARNING - Command stderr: zsh:kill:1: kill 1483564 failed: operation not permitted

2025-09-19 20:16:24,285 - run_and_check - WARNING - Failed to resume process on store-1:9000, exit code: 1
2025-09-19 20:16:24,285 - run_and_check - INFO - No dead nodes detected. No need to start any services.
2025-09-19 20:16:24,285 - run_and_check - INFO - Cluster recovery completed successfully.
2025-09-19 20:16:24,285 - run_and_check - INFO - Flushing database before iteration 10...
2025-09-19 20:16:24,285 - run_and_check - INFO - FLUSHDB - Performing initial health check...
2025-09-19 20:16:24,285 - run_and_check - INFO - run command redis-cli -h store-2 -p 6389 ping
2025-09-19 20:16:24,295 - run_and_check - INFO - FLUSHDB - Health check: store-2:6389 (unknown) is responsive
2025-09-19 20:16:24,295 - run_and_check - INFO - run command redis-cli -h store-1 -p 9000 ping
2025-09-19 20:16:24,301 - run_and_check - WARNING - Command stderr: Error: Protocol error, got "H" as reply type byte

2025-09-19 20:16:24,302 - run_and_check - WARNING - FLUSHDB - Health check: store-1:9000 (unknown) is not responsive
2025-09-19 20:16:24,302 - run_and_check - INFO - FLUSHDB - Found 1 responsive database nodes
2025-09-19 20:16:24,302 - run_and_check - INFO - FLUSHDB - Leader nodes to try: [('store-2', 6389, 'leader')]
2025-09-19 20:16:24,302 - run_and_check - INFO - FLUSHDB - Standby nodes available: []
2025-09-19 20:16:24,303 - run_and_check - INFO - FLUSHDB - Trying LEADER node store-2:6389 (leader) with command: redis-cli -h store-2 -p 6389 flushdb
2025-09-19 20:16:24,303 - run_and_check - INFO - FLUSHDB - Testing connection to store-2:6389 with: redis-cli -h store-2 -p 6389 ping
2025-09-19 20:16:24,303 - run_and_check - INFO - run command redis-cli -h store-2 -p 6389 ping
2025-09-19 20:16:24,310 - run_and_check - INFO - FLUSHDB - Connection test successful for store-2:6389: PONG
2025-09-19 20:16:24,310 - run_and_check - INFO - FLUSHDB - Attempt 1/2 with 90s timeout
2025-09-19 20:16:24,311 - run_and_check - INFO - run command redis-cli -h store-2 -p 6389 flushdb
2025-09-19 20:17:54,382 - run_and_check - ERROR - Command timed out after 90 seconds: redis-cli -h store-2 -p 6389 flushdb
2025-09-19 20:17:54,382 - run_and_check - INFO - FLUSHDB ret_code=1
 stdout=""
2025-09-19 20:17:54,382 - run_and_check - WARNING - FLUSHDB LEADER ATTEMPT 1 FAILED - Node: store-2:6389 (leader)
2025-09-19 20:17:54,382 - run_and_check - WARNING - FLUSHDB LEADER ATTEMPT 1 FAILED - Exit code: 1
2025-09-19 20:17:54,382 - run_and_check - WARNING - FLUSHDB LEADER ATTEMPT 1 FAILED - Output: No output
2025-09-19 20:17:54,382 - run_and_check - INFO - FLUSHDB - Retrying in 5 seconds...
2025-09-19 20:17:59,387 - run_and_check - INFO - FLUSHDB - Attempt 2/2 with 120s timeout
2025-09-19 20:17:59,388 - run_and_check - INFO - run command redis-cli -h store-2 -p 6389 flushdb
2025-09-19 20:19:59,479 - run_and_check - ERROR - Command timed out after 120 seconds: redis-cli -h store-2 -p 6389 flushdb
2025-09-19 20:19:59,479 - run_and_check - INFO - FLUSHDB ret_code=1
 stdout=""
2025-09-19 20:19:59,479 - run_and_check - WARNING - FLUSHDB LEADER ATTEMPT 2 FAILED - Node: store-2:6389 (leader)
2025-09-19 20:19:59,479 - run_and_check - WARNING - FLUSHDB LEADER ATTEMPT 2 FAILED - Exit code: 1
2025-09-19 20:19:59,480 - run_and_check - WARNING - FLUSHDB LEADER ATTEMPT 2 FAILED - Output: No output
2025-09-19 20:19:59,480 - run_and_check - INFO - run command ping -c 1 -W 5 store-2
2025-09-19 20:19:59,489 - run_and_check - INFO - FLUSHDB LEADER FAILED - Node store-2 is reachable via ping
2025-09-19 20:19:59,489 - run_and_check - WARNING - FLUSHDB - All leader nodes failed, trying standby nodes...
2025-09-19 20:19:59,489 - run_and_check - ERROR - FLUSHDB FAILED - All nodes (both leader and standby) failed
2025-09-19 20:19:59,489 - run_and_check - ERROR - Error: Failed to flush database after 1 minute.
2025-09-19 20:19:59,489 - run_and_check - ERROR - Warning: Error stopping memtier benchmark: Database flush failed
2025-09-19 20:19:59,489 - run_and_check - INFO - run command ssh store-2 'grep -i "Assertion" /mnt/ssd/eloq/zhouxiao/jepsen-eloqkv-rocksdbcloud-single/EloqKV/logs/std-output/std-out-6389'
2025-09-19 20:20:00,255 - run_and_check - INFO - run command ssh store-2 'find /mnt/ssd/eloq/zhouxiao/jepsen-eloqkv-rocksdbcloud-single/EloqKV/logs/node-6389/ -name "asan.*" -type f'
2025-09-19 20:20:00,994 - run_and_check - INFO - No ASAN files found on store-2:6389
2025-09-19 20:20:00,994 - run_and_check - INFO - run command readlink -f ./store/current
2025-09-19 20:20:00,999 - run_and_check - INFO - run command rsync -azL '/home/eloq/workspace/jepsen-eloqkv-rocksdbcloud-single/store/eloqkv append pause,partition,clock,kill/20250919T200945.122+0800/' '/home/eloq/workspace/jepsen-eloqkv-rocksdbcloud-single/error_log/jepsen-eloqkv-rocksdbcloud-single/2025-09-19_20-20-00/jepsen'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions