Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
现在的情况
根据配置的cmd和cmdline 周期性扫描/proc下符合条件的进程, 当/proc下有符合条件的进程时,则proc_num数目大于0,反正则等于0.
问题
考虑这种情况,假如有一个程序存在缓慢的内存泄漏,持续很长时间后因OOM被干掉,被干掉后立马由supervisor等进程管理工具自动拉起,如果整个挂掉到被拉起的过程刚好在agent扫描/proc的周期中, 则从agent的视角来看认为此程序状态是正常的(实际上发生了异常重启),使得这种隐蔽问题被发现的周期大大延长.进程端口监控同样有这种问题.