Skip to content

大量数据断点 #184

@wdtte

Description

@wdtte

rocketmq-exporter 采集的 rocketmq_group_diff 指标出现大量数据断点,有的甚至几天时间才一个数据点,grafana 截图:
Image

rocketmq-exporter 启动日志中疑似相关的错误:
(错误信息大致指向broker通信失败,若真的有网络问题,业务方早就受影响了,但目前仅发现监控数据残缺;因此不知如何继续跟进)

[2025-11-03 10:39:55.140] ERROR get topic's(paas_oplog_****) consumer-stats(oplog-****-***) exception
org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send request to <172.17.41.89:10911> failed
	at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441)
	at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390)
	at org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumeStats(MQClientAPIImpl.java:1220)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.examineConsumeStats(DefaultMQAdminExtImpl.java:315)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.examineConsumeStats(DefaultMQAdminExt.java:258)
	at org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.examineConsumeStats(MQAdminExtImpl.java:232)
	at org.apache.rocketmq.exporter.task.MetricsCollectTask.collectConsumerOffset(MetricsCollectTask.java:336)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84)
	at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
	at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)

除此之外还有很多其他错误:

[2025-11-03 11:09:20.003]  WARN ClientMetricTask-exception.ignore. group=paas-****-*****-consumer,client id=10.128.217.24@172.17.41.79:9876;172.17.41.80:9876, client addr=172.17.45.10:55377, language=JAVA,version=477
org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send request to <172.17.41.99:10911> failed
	at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441)
	at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390)
	at org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumerRunningInfo(MQClientAPIImpl.java:1917)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.getConsumerRunningInfo(DefaultMQAdminExtImpl.java:842)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.getConsumerRunningInfo(DefaultMQAdminExt.java:469)
	at org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.getConsumerRunningInfo(MQAdminExtImpl.java:407)
	at org.apache.rocketmq.exporter.task.ClientMetricTaskRunnable.run(ClientMetricTaskRunnable.java:64)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
[2025-11-03 11:09:20.006]  INFO closeChannel: close the connection to remote address[172.17.41.99:10911] result: true
[2025-11-03 11:09:20.007]  WARN ClientMetricTask-exception.ignore. group=oplog-***-***,client id=10.120.241.231@172.17.41.79:9876;172.17.41.80:9876, client addr=172.17.5.14:58456, language=JAVA,version=477
org.apache.rocketmq.remoting.exception.RemotingSendRequestException: send request to <172.17.41.99:10911> failed
	at org.apache.rocketmq.remoting.netty.NettyRemotingAbstract.invokeSyncImpl(NettyRemotingAbstract.java:441)
	at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:390)
	at org.apache.rocketmq.client.impl.MQClientAPIImpl.getConsumerRunningInfo(MQClientAPIImpl.java:1917)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.getConsumerRunningInfo(DefaultMQAdminExtImpl.java:842)
	at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.getConsumerRunningInfo(DefaultMQAdminExt.java:469)
	at org.apache.rocketmq.exporter.service.client.MQAdminExtImpl.getConsumerRunningInfo(MQAdminExtImpl.java:407)
	at org.apache.rocketmq.exporter.task.ClientMetricTaskRunnable.run(ClientMetricTaskRunnable.java:64)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
[2025-11-03 11:09:25.003]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:24:23.454]  INFO Completed initialization in 1 ms
[2025-11-03 10:25:15.000]  INFO broker stats collection task starting....
[2025-11-03 10:25:15.000]  INFO broker runtime stats collection task starting....
[2025-11-03 10:25:15.000]  INFO consumer offset collection task starting....
[2025-11-03 10:25:15.001]  INFO broker topic stats collection task starting....
[2025-11-03 10:25:15.001]  INFO producer metric collection task starting....
[2025-11-03 10:25:15.639]  INFO broker runtime stats collection task finished....639
[2025-11-03 10:25:15.639]  INFO topic offset collection task starting....
[2025-11-03 10:25:15.644]  INFO broker stats collection task finished....644
[2025-11-03 10:25:16.554]  WARN collectTopicOffset-getting topic(%RETRY%oplog-object-change) stats error. the namesrv address is ["172.17.41.80:9876"]
[2025-11-03 10:25:20.079]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-j, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:20.079]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-j, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:20.081]  INFO closeChannel: close the connection to remote address[172.17.41.99:10911] result: true
[2025-11-03 10:25:25.085]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:25.085]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-k, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:25.085]  INFO closeChannel: close the connection to remote address[172.17.41.101:10911] result: true
[2025-11-03 10:25:30.086]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-h, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:30.086]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-h, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:30.089]  INFO closeChannel: close the connection to remote address[172.17.41.95:10911] result: true
[2025-11-03 10:25:30.089]  INFO closeChannel: close the connection to remote address[172.17.41.95:10911] result: true
[2025-11-03 10:25:35.088]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-i, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:35.088]  INFO closeChannel: close the connection to remote address[172.17.41.97:10911] result: true
[2025-11-03 10:25:35.088]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-i, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:40.089]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-f, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:40.089]  INFO closeChannel: close the connection to remote address[172.17.41.91:10911] result: true
[2025-11-03 10:25:40.089]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-f, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:45.089]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-g, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:45.090]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-g, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:45.096]  INFO closeChannel: close the connection to remote address[172.17.41.93:10911] result: true
[2025-11-03 10:25:45.495]  INFO topic offset collection task finished....29856
[2025-11-03 10:25:50.090]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-d, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:50.090]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-d, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:50.091]  INFO closeChannel: close the connection to remote address[172.17.41.87:10911] result: true
[2025-11-03 10:25:55.092]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-e, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:55.092]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-e, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:25:55.094]  INFO closeChannel: close the connection to remote address[172.17.41.89:10911] result: true
[2025-11-03 10:26:00.092]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-b, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:26:00.093]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-b, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:26:00.093]  INFO closeChannel: close the connection to remote address[172.17.41.83:10911] result: true
[2025-11-03 10:26:05.096]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-c, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:26:05.096]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-c, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:26:05.285]  INFO closeChannel: close the connection to remote address[172.17.41.85:10911] result: true
[2025-11-03 10:26:10.096]  WARN collectProducer. should not be here. cluster=**-****, brokerName=broker-a, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:26:10.097]  WARN collectProducer. there are no producers in cluster=**-****, brokerName=broker-a, name srv= ["172.17.41.80:9876"]
[2025-11-03 10:26:10.097]  INFO closeChannel: close the connection to remote address[172.17.41.81:10911] result: true
[2025-11-03 10:26:15.000]  INFO topic offset collection task starting....
[2025-11-03 10:26:15.000]  INFO broker runtime stats collection task starting....
[2025-11-03 10:26:15.045]  INFO broker runtime stats collection task finished....44

请问我们可以如何解决这个问题?或者可以向哪些方向排查?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions