-
Notifications
You must be signed in to change notification settings - Fork 6
Description
I would like to report an issue from the stackdriver-metadata-agent on our production GKE 1.18.17-gke.700 with cloud loggin and monitoring enabled. The machine type of node is n1-standard-1 (1 vCpu, 3.75GB mem)
A few days ago (2021-06-08 9:31:xx GMT+08:00), the cpu usage of the stackdriver-metadata-agent-cluster-level pod suddenly grew drastically. Thus, my production services within the same suffered from severe timeout issues. See the attached CPU chart for reference.
The containers within pod are:
metadata-agent: gcr.io/stackdriver-agents/metadata-agent-go:1.2.0
metadata-agent-nanny: gke.gcr.io/addon-resizer:1.8.11-gke.1
During that time, no suspicious logs from the containers are reported.
Since I could not find the corresponding repository for the metadata agent, I would like to know if any possible issue regarding the CPU load issue was raised and possible resolutions. Owing to the lack of a concrete root cause, I'm concerned about it might happen once again. Or if my report should be created on the specific repository for the issue, please let me know.
Thanks for your consideration!

