[improve][misc] Sync commits from apache/branch-4.16 into ds-4.16#31
Merged
[improve][misc] Sync commits from apache/branch-4.16 into ds-4.16#31
Conversation
(cherry picked from commit e02b9c2)
(cherry picked from commit 5ee2178)
…kie is not available. (apache#4439) When the bookie is not available, the RackawareEnsemblePlacementPolicyImpl default rack will be `/default-region/default-rack`, it should be `/default-rack` for RackawareEnsemblePlacementPolicyImpl. There are some logs. ``` 2024-06-17T05:22:46,591+0000 [ReplicationWorker] ERROR org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy - Cannot resolve bookieId `test-bk-3:3181` to a network address, resolving as /default-region/default-rack org.apache.bookkeeper.proto.BookieAddressResolver$BookieIdNotResolvedException: Cannot resolve bookieId test-bk-3:3181, bookie does not exist or it is not running at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:66) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.resolveNetworkLocation(TopologyAwareEnsemblePlacementPolicy.java:821) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.createBookieNode(TopologyAwareEnsemblePlacementPolicy.java:811) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.convertBookieToNode(TopologyAwareEnsemblePlacementPolicy.java:845) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.TopologyAwareEnsemblePlacementPolicy.convertBookiesToNodes(TopologyAwareEnsemblePlacementPolicy.java:837) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl.replaceBookie(RackawareEnsemblePlacementPolicyImpl.java:474) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicy.replaceBookie(RackawareEnsemblePlacementPolicy.java:119) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.BookKeeperAdmin.getReplacementBookiesByIndexes(BookKeeperAdmin.java:993) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.BookKeeperAdmin.replicateLedgerFragment(BookKeeperAdmin.java:1025) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:473) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.replication.ReplicationWorker.rereplicate(ReplicationWorker.java:301) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.replication.ReplicationWorker.run(ReplicationWorker.java:249) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[io.netty-netty-common-4.1.108.Final.jar:4.1.108.Final] at java.lang.Thread.run(Thread.java:840) ~[?:?] Caused by: org.apache.bookkeeper.client.BKException$BKBookieHandleNotAvailableException: Bookie handle is not available at org.apache.bookkeeper.discover.ZKRegistrationClient.getBookieServiceInfo(ZKRegistrationClient.java:226) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] at org.apache.bookkeeper.client.DefaultBookieAddressResolver.resolve(DefaultBookieAddressResolver.java:45) ~[io.streamnative-bookkeeper-server-4.16.5.2.jar:4.16.5.2] ... 13 more ``` (cherry picked from commit fb71383) (cherry picked from commit 8cc815e)
…ache#4476) Signed-off-by: ZhangJian He <shoothzj@gmail.com> (cherry picked from commit 5662524) (cherry picked from commit 66e84ad)
--- ### Motivation Change the new ensemble log to the info level. Sometimes, the ensemble may not satisfied with the placement policy. The log should be info level but not a warn level. Because it will fix by the auto recovery later, so this just a information from the ensemble choose. (cherry picked from commit a12943d) (cherry picked from commit 0ce47aa)
* fix entry location compaction * replace entryLocationCompactionEnable with entryLocationCompactionInterval * Add randomCompactionDelay to avoid all the bookies triggering compaction simultaneously * Fix the style issue * Fix the style issue * Fix test --------- Co-authored-by: houbonan <houbonan@didiglobal.com> Co-authored-by: zymap <zhangyong1025.zy@gmail.com> (cherry picked from commit ede1ba9) (cherry picked from commit fbd33b5)
…ache#4244) ### Motivation If the system property `readonlymode.enabled` is set to true on a ZooKeeper server, read-only mode is enabled. Data can be read from the server in read-only mode even if that server is split from the quorum. https://zookeeper.apache.org/doc/current/zookeeperAdmin.html#Experimental+Options%2FFeatures To connect to the server in read-only mode, the client must also allow read-only mode. The `ZooKeeperClient` class in the bookkeeper repository also has an option called `allowReadOnlyMode`. https://github.com/apache/bookkeeper/blob/15171e1904f7196d8e9f4116ab2aecdf582e0032/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperClient.java#L219-L222 However, even if this option is set to true, the connection to the server in read-only mode will actually fail. The cause is in the `ZooKeeperWatcherBase` class. When the `ZooKeeperWatcherBase` class receives the `SyncConnected` event, it releases `clientConnectLatch` and assumes that the connection is complete. https://github.com/apache/bookkeeper/blob/15171e1904f7196d8e9f4116ab2aecdf582e0032/bookkeeper-server/src/main/java/org/apache/bookkeeper/zookeeper/ZooKeeperWatcherBase.java#L128-L144 However, if the server is in read-only mode, it will receive `ConnectedReadOnly` instead of `SyncConnected`. This causes the connection to time out without being completed. ### Changes Modified the switch statement in the `ZooKeeperWatcherBase` class to release `clientConnectLatch` when `ConnectedReadOnly` is received if the `allowReadOnlyMode` option is true. By the way, `allowReadOnlyMode` is never set to true in BookKeeper. So this change would be useless for BookKeeper. However, it is useful for Pulsar. Because Pulsar also uses `ZooKeeperWatcherBase` and needs to be able to connect to ZooKeeper in read-only mode. https://github.com/apache/pulsar/blob/cba1600d0f6a82f1ea194f3214a80f283fe8dc27/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/impl/PulsarZooKeeperClient.java#L242-L244 (cherry picked from commit 4d50a44) (cherry picked from commit 6fbc3c0)
…pache#4473) Signed-off-by: ZhangJian He <shoothzj@gmail.com> (cherry picked from commit 7ab29e6) (cherry picked from commit 8996ccd)
…he#4482) ### Motivation In file RackawareEnsemblePlacementPolicyImpl.java In the log.warn below, we should print out the list of ensemble, instead of the object. The “ensemble” in line-619 should be changed into “ensemble.toList()”. https://github.com/apache/bookkeeper/blob/999cd0f2ab14404be4d6c24e388456dbe56bb1a8/bookkeeper-server/src/main/java/org/apache/bookkeeper/client/RackawareEnsemblePlacementPolicyImpl.java#L600C3-L619C47 ### Changes The “ensemble” in line-619 changed into “ensemble.toList()”. (cherry picked from commit 7c41204) (cherry picked from commit 9bc90b4)
…e#4502) Netty release 4.1.113.Final contains some bug fixes that are relevant for Bookkeeper. It's also worth keeping Netty updated since besides bug fixes, we will also catch possible regressions closer to releases when a change in Netty causes a regression. * [Netty 4.1.113 release notes](https://netty.io/news/2024/09/04/4-1-113-Final.html) * [Netty 4.1.112 release notes](https://netty.io/news/2024/07/19/4-1-112-Final.html) - upgrade Netty to 4.1.113.Final - upgrades netty-tcnative to 2.0.66.Final (cherry picked from commit 57f9e02) (cherry picked from commit 818a37a)
…cute concurrency (apache#4462) ### Motivation | step | `BK client 1` | `BK client 2` | | --- | --- | --- | | 1 | create ledger `1` | | 2 | | open ledger `1` | | 3 | | delete ledger `1` | | 4 | write data to ledger `1` | At the step `4`, the write should fail, but it succeeds. It leads users to assume the data has been written, but it can not be read. You can reproduce the issue by `testWriteAfterDeleted` There is a scenario that will lead to Pulsar loss messages - `broker-2` created a ledger - `broker-2`'s ZK session is expired, which will lead the topic it owned to be assigned to other brokers - `broker-0` owned the topic again - it will delete the last empty ledger - consumers connected to `broker-0` - producers connected to `broker-2` - send messages to the topic - on `broker-2`, the ledger can not be closed due to the ledger metadata has been deleted ### Changes Once the ledger is fenced, it can not be wrote anymore. (cherry picked from commit 47ef48e) (cherry picked from commit 93118aa)
CVE-2024-7254 Upgrade protobuf to 3.25.5 (cherry picked from commit 0229b5d) (cherry picked from commit b9c567c)
…mble (apache#4478) (cherry picked from commit 0376bdc) (cherry picked from commit 33a1985)
* Upgrade Zookeeper to 3.9.3 to address CVE-2024-51504 * Upgrade curator to 5.7.1 (cherry picked from commit af8baa1) (cherry picked from commit 7cba2a6)
…pache#4584) (cherry picked from commit 02d7f9f) (cherry picked from commit 2182932)
While working on apache#4580, I noticed that the Commons Compress version and the Commons Codec library version aren't compatible. It causes a ClassDefNotFoundError in running tests which were using some specific methods of docker-java. While exploring this, I noticed that Apache Commons library versions haven't been kept up-to-date for a long time and it's better to handle that for 4.18.0 release. - upgrade commons-cli from 1.2 to 1.9.0 - upgrade commons-codec from 1.6 to 1.18.0 - upgrade commons-io from 2.17.0 to 2.19.0 - upgrade commons-lang3 from 3.6 to 3.17.0 - upgrade commons-compress from 1.26.0 to 1.27.0 (cherry picked from commit bcd6b52) (cherry picked from commit a01aa11)
…t after rocksdb has been closed (apache#4581) * Fix the coredump that occurs when calling KeyValueStorageRocksDB.count() (possibly triggered by Prometheus) after RocksDB has been closed(apache#4243) * fix race when count op in process and db gets closed. --------- Co-authored-by: zhaizhibo <zhaizhibo@kuaishou.com> (cherry picked from commit 2831ed3) (cherry picked from commit 3795f55)
(cherry picked from commit c9dc52a)
* Fix check read failed entry memory leak issue. * address the comments. (cherry picked from commit f0c406b)
…ache#4607) * Fix the data loss issue that caused by the wrong entry log header --- # Motivation We observed numerous errors in the broker that failed to read the ledger from the bookkeeper; although the ledger metadata still exists, it was unable to read from the bookkeeper. After checking the data, we found the ledger located entry log was deleted by the bookkeeper. We have a data loss issue with the bookkeeper. The entry log file was deleted by the Garbage collector because the entry log file wrote a wrong file header. And there is an example that the shows the header is wrong: ``` Failed to get ledgers map index from: 82.log : Not all ledgers were found in ledgers map index. expected: -1932430239 -- found: 0 -- entryLogId: 82 ``` * Add test (cherry picked from commit 52d779a)
…pache#4504) (cherry picked from commit 07de650) (cherry picked from commit 857959e)
…ache#4557) [fix] Write stuck due to pending add callback by multiple threads (apache#4557) (cherry picked from commit e47926b) (cherry picked from commit 804b2b6)
(cherry picked from commit 3bceb37)
…pache#4586) (cherry picked from commit 7fbb8d8) (cherry picked from commit 065b9fe)
### Motivation & Changes Upgrade Jetty to 9.4.57.v20241219 to address CVE-2024-6763 Jetty 9.4.57.v20241219 contains backported CVE-2024-6763 fix in jetty/jetty.project#12532 although it's not explicitly mentioned and most security scanners don't yet contain the information that it's been addressed in 9.4.57. More details: * jetty/jetty.project#12630 * https://github.com/jetty/jetty.project/releases/tag/jetty-9.4.57.v20241219 Note: The backport is a partial mitigation and Jetty 9.4.57 will continue to be marked as vulnerable. There's a discussion and explanation here: https://gitlab.eclipse.org/security/cve-assignement/-/issues/25#note_2968611 (cherry picked from commit 99eb63a) (cherry picked from commit cda3c6b)
(cherry picked from commit 8e0d75a)
…4-6763 (apache#4600) (apache#4631) (cherry picked from commit add6808)
### Motivation We met the GarbageCollectionThread was stopped by some runtime error, but we didn't catch it then, causing the GC to stop. Such as: apache#3901 apache#4544 In our case, the GC stopped because of the OutOfDirectMemoryException then the process stopped and the files can not be deleted. But we didn't see any error logs. This PR enhance the log info when an unhandled error happens. We already have the [PR](apache#4544) fixed that. And another fix in this PR is to change the Exception to the Throwable in the getEntryLogMetadata. Here is the error stack: ``` io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 213909504 byte(s) of direct memory (used: 645922847, max: 858783744) at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:880) at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:809) at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:718) at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:707) at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:224) at io.netty.buffer.PoolArena.allocate(PoolArena.java:142) at io.netty.buffer.PoolArena.reallocate(PoolArena.java:317) at io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:123) at io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:305) at io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:280) at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1103) at org.apache.bookkeeper.bookie.BufferedReadChannel.read(BufferedReadChannel.java:104) at org.apache.bookkeeper.bookie.DefaultEntryLogger.extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109) at org.apache.bookkeeper.bookie.DefaultEntryLogger.getEntryLogMetadata(DefaultEntryLogger.java:1060) at org.apache.bookkeeper.bookie.GarbageCollectorThread.extractMetaFromEntryLogs(GarbageCollectorThread.java:678) at org.apache.bookkeeper.bookie.GarbageCollectorThread.runWithFlags(GarbageCollectorThread.java:365) at org.apache.bookkeeper.bookie.GarbageCollectorThread.lambda$triggerGC$4(GarbageCollectorThread.java:268) at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.base/java.util.concurrent.FutureTask.run(Unknown Source) at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) ``` You can see it get much more memory used here extractEntryLogMetadataFromIndex(DefaultEntryLogger.java:1109). The reason is that the header has the wrong data of the header, which should already be fixed by apache#4607. Then it reading with a wrong map size which could take a lot of memory. (cherry picked from commit e80d031) (cherry picked from commit 35b8f13)
(cherry picked from commit ecb7603)
…ndows (apache#4665) Co-authored-by: fengyubiao <fengyubiao@100tal.com> (cherry picked from commit 5154149)
(cherry picked from commit b88fb5f)
(cherry picked from commit 95277b6)
This reverts commit 4a471bd.
(cherry picked from commit dca89a7)
(cherry picked from commit 1d3c5f9)
dlg99
approved these changes
Feb 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Descriptions of the changes in this PR:
Motivation
(Explain: why you're making that change, what is the problem you're trying to solve)
Changes
(Describe: what changes you have made)
Master Issue: #