Skip to content

[fix][test] fix Flaky-test: ExtensibleLoadManagerImplTest.testLoadBalancerServiceUnitTableViewSyncer#3

Open
berg223 wants to merge 7 commits intomasterfrom
fix_ExtensibleLoadManagerImplTest
Open

[fix][test] fix Flaky-test: ExtensibleLoadManagerImplTest.testLoadBalancerServiceUnitTableViewSyncer#3
berg223 wants to merge 7 commits intomasterfrom
fix_ExtensibleLoadManagerImplTest

Conversation

@berg223
Copy link
Owner

@berg223 berg223 commented Jun 8, 2025

Fixes apache#24357

Main Issue: apache#24357

Motivation

The unit test failed because loadManager4.getBrokerRegistry().unregister(); is not enough to ensure a successful unregister. The root cause is that broker will register it self after unregister. This happens in two place:

  1. There is a scheduler task named monitorTask in ExtensibleLoadManagerImpl, it will periodically register self.
  2. When brokerRegistry start, it will register a listener. Once broker unregistered, it will callback handleMetadataStoreNotification and try to register itself again.

Modifications

The modifications contains three part:

  1. changes not in testLoadBalancerServiceUnitTableViewSyncer is just for formatting. Please ignore it!
  2. changes to resolve this issue is:
               // simulate pulsar4 store error which will unregister the broker
                var brokerRegistry = loadManager4.getBrokerRegistry();
                var brokerLookupDataMetadataCache = (MetadataCache<BrokerLookupData>) spy(
                        FieldUtils.readField(brokerRegistry, "brokerLookupDataMetadataCache", true));
                doReturn(CompletableFuture.failedFuture(new MetadataStoreException("store error"))).when(
                        brokerLookupDataMetadataCache).put(anyString(), any(), any());
                FieldUtils.writeDeclaredField(brokerRegistry, "brokerLookupDataMetadataCache",
                        brokerLookupDataMetadataCache, true);
                brokerRegistry.unregister();

                NamespaceName slaMonitorNamespace =
                        getSLAMonitorNamespace(pulsar4.getBrokerId(), pulsar.getConfiguration());
                String slaMonitorTopic = slaMonitorNamespace.getPersistentTopicName("test");

                Awaitility.await()
                        .atMost(10, TimeUnit.SECONDS)
                        .pollInterval(1, TimeUnit.SECONDS)
                        .untilAsserted(() -> {
                            String currentResult = pulsar.getAdminClient().lookups().lookupTopic(slaMonitorTopic);
                            assertNotNull(currentResult);
                            log.info("{} Namespace is re-owned by {}", slaMonitorTopic, currentResult);
                            assertNotEquals(currentResult, pulsar4.getBrokerServiceUrl());
                        });
  1. I have found other factor to fail the test when I debug in testLoadBalancerServiceUnitTableViewSyncer. The syncer is not started and it maybe cause exception sometime. So I changed it to:
String syncerTyp = serviceUnitStateTableViewClassName.equals(ServiceUnitStateTableViewImpl.class.getName()) ?
                "SystemTopicToMetadataStoreSyncer" : "MetadataStoreToSystemTopicSyncer";
        pulsar.getAdminClient().brokers()
                .updateDynamicConfiguration("loadBalancerServiceUnitTableViewSyncer", syncerTyp);
        Awaitility.waitAtMost(10, TimeUnit.SECONDS)
                .pollInterval(1, TimeUnit.SECONDS)
                .untilAsserted(() -> {
                    assertTrue(pulsar1.getConfiguration().isLoadBalancerServiceUnitTableViewSyncerEnabled());
                    assertTrue(pulsar2.getConfiguration().isLoadBalancerServiceUnitTableViewSyncerEnabled());
                });

        // We invoke monitor method to ensure SystemTopicToMetadataStoreSyncer to start or close because syncer will not be started or close after pulsar.getAdminClient().brokers().updateDynamicConfiguration();
.
        primaryLoadManager.monitor();
        secondaryLoadManager.monitor();

       // We invoke monitor method in makePrimaryAsLeader and makeSecondaryAsLeader because monitor can immediately trigger serviceUnitStateTableViewSyncer to start or close without wait.
        makeSecondaryAsLeader();
        makePrimaryAsLeader();
        assertTrue(primaryLoadManager.getServiceUnitStateTableViewSyncer().isActive());
        assertFalse(secondaryLoadManager.getServiceUnitStateTableViewSyncer().isActive());

Verifying this change

  • Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@berg223 berg223 changed the title [fix][test] fix Flaky-test: ExtensibleLoadManagerImplTest.testLoadBalancerServiceUnitTableViewSyncer [fix][test] fix Flaky-test: ExtensibleLoadManagerImplTest.testLoadBalancerServiceUnitTableViewSyncer Jun 8, 2025
@berg223 berg223 closed this Jun 8, 2025
@berg223 berg223 reopened this Jun 8, 2025
# Conflicts:
#	pulsar-broker/src/test/java/org/apache/pulsar/broker/loadbalance/extensions/ExtensibleLoadManagerImplTest.java
@berg223 berg223 closed this Jul 17, 2025
@berg223 berg223 reopened this Jul 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky-test: ExtensibleLoadManagerImplTest.testLoadBalancerServiceUnitTableViewSyncer

1 participant