[improve][broker]PIP-333 Add monitor metrics for the number of connections to client IPs and roles#21935
[improve][broker]PIP-333 Add monitor metrics for the number of connections to client IPs and roles#21935yyj8 wants to merge 3 commits intoapache:masterfrom
Conversation
…tions to client IPs and roles
…tions to client IPs and roles
|
@yyj8 Please add the following content to your PR description and select a checkbox: |
…tions to client IPs and roles
|
/pulsarbot run-failure-checks |
|
This is a good way to kill Prometheus or any other TSDB. |
|
Agree with @asafm |
|
Dear @asafm @dao-jun ,Thank you for your help. I mainly considered two scenarios:
So, can we add a parameter |
|
Can a role be enough? It should be low cardinality, I believe. |
Do you think the following implementation would be better? The above metircs may not be able to quickly locate the specific server causing high connectivity in scenarios where the same role has connections on multiple servers. We have encountered this issue in our current production environment. |
|
Just thinking out loud: Given you know the role and your app using the client can expose the role so you can pinpoint the hosts running the clients this way, will the number of connections from the client side suffice? |
Our current pain point is that multiple business teams are accessing the same pulsar cluster, and the current cluster traffic is not high (within 200MB/s), but there are a lot of topics (divided into multiple tenants according to actual customers, nearly 200000 topic partitions). The number of connections in the pulsar cluster has reached nearly 500000, resulting in frequent instances where the proxy or broker set connection count is full and all client requests are rejected, Because there is no place to obtain the connection status of the role dimension or IP dimension summary, it is impossible to find which business is experiencing abnormal access and promote its optimization. Main scenarios:
Only with both of these pieces of information can we accurately locate which team had an abnormal access. |
|
@yyj8 Just validating your requirements - can you elaborate on why monitoring the connections count from the client side and not from the broker side would be impossible? |
|
it is totally hard to make the decision, the metric might be helpful to troubleshooting. but pulsar broker has tooo many metrics and tooo many configuration items, and if we add Since we can't convince each other, I suggest send a discuss to the mail list, let the community decides. |
mail list:https://lists.apache.org/thread/6oyhv6fmzr72oc1hcxw9llwvzlw09cqp |
For detailed improvement instructions, please refer to issues:
#21934
Motivation
Currently, Pulsar does not monitor the number of connections to client IPs and roles. When there are many business teams accessing the cluster, and there are also many topics and production consumers, it is difficult for us to quickly locate which IPs or roles are accessing the cluster abnormally from a global perspective and optimize them. So, we hope to have a monitoring indicator that can statistically analyze the connection information between IP and role dimensions, making it easier to quickly locate problems in the later stage.
Modifications
Add new metircs to ServerCnx.java class:
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: yyj8#7