[improve][broker]PIP-340 Optimization of Probe Implementation for Automatic Failover by yyj8 · Pull Request #8 · yyj8/pulsar

yyj8 · 2024-02-27T11:27:24Z

Motivation

The current Java client implementation has certain flaws in automatic fault switching.

org.apache.pulsar.client.impl.AutoClusterFailover.java
boolean probeAvailable(String url) {
        try {
            resolver.updateServiceUrl(url);
            InetSocketAddress endpoint = resolver.resolveHost();
            Socket socket = new Socket();
            socket.connect(new InetSocketAddress(endpoint.getHostName(), endpoint.getPort()), TIMEOUT);
            socket.close();

            return true
        } catch (Exception e) {
            log.warn("Failed to probe available, url: {}", url, e);
            return false;
        }
    }

The client only establishes a TCP connection with the exposed connection address of the cluster to determine whether the cluster is available, which cannot adapt to scenarios where the cluster is partially unavailable (half dead). In this scenario, we hope to make corresponding fault switching judgments by initiating cluster health status requests to the cluster. Then within the cluster, we provide an admin management command to update the cluster's health status. To avoid this scenario, all businesses that need to connect to this cluster need to manually switch cluster connection addresses and restart applications, resulting in inconsistent link data among multiple business team due to inconsistent operation steps.

Modifications

Add a new cluster health status request and response request;

case HEALTH_CHECK:
	checkArgument(cmd.hasHealthCheck());
	handleHealthCheck(cmd.getHealthCheck());
	break;

case HEALTH_CHECK_RESPONSE:
	checkArgument(cmd.hasHealthCheckResponse());
	handleHealthCheckResponse(cmd.getHealthCheckResponse());
	break;

Add a new admin management command to manually update the cluster health status;

//Update cluster health status, available or unavailable. default available
bin/pulsar-admin clusters update-health-status --status unavailable

For other detailed information, please refer to the PR code.

Verifying this change

Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (10MB)
Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:
apache#22133

…omatic Failover

[improve][broker]PIP-340 Optimization of Probe Implementation for Aut…

d333e48

…omatic Failover

yyj8 mentioned this pull request Feb 27, 2024

[improve][broker]PIP-340 Optimization of Probe Implementation for Automatic Failover apache/pulsar#22133

Open

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][broker]PIP-340 Optimization of Probe Implementation for Automatic Failover#8

[improve][broker]PIP-340 Optimization of Probe Implementation for Automatic Failover#8
yyj8 wants to merge 1 commit intomasterfrom
auto_cluster_failover_optimize

yyj8 commented Feb 27, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yyj8 commented Feb 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Matching PR in forked repository

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

yyj8 commented Feb 27, 2024 •

edited

Loading