Sudden high cpu usage from stackdriver-metadata-agent-cluster-level pod

I would like to report an issue from the stackdriver-metadata-agent on our production GKE 1.18.17-gke.700 with cloud loggin and monitoring enabled. The machine type of node is n1-standard-1 (1 vCpu, 3.75GB mem)

A few days ago (2021-06-08 9:31:xx GMT+08:00), the cpu usage of the stackdriver-metadata-agent-cluster-level pod suddenly grew drastically. Thus, my production services within the same suffered from severe timeout issues.  See the attached CPU chart for reference.

![截圖 2021-06-11 下午4 31 18](https://user-images.githubusercontent.com/1139696/121664941-98ca2d00-cada-11eb-8101-857516ebdbeb.png)

The containers within pod are:
metadata-agent: gcr.io/stackdriver-agents/metadata-agent-go:1.2.0
metadata-agent-nanny: gke.gcr.io/addon-resizer:1.8.11-gke.1

During that time, no suspicious logs from the containers are reported.

metadata-agent logs
![截圖 2021-06-11 下午5 50 48](https://user-images.githubusercontent.com/1139696/121668226-acc35e00-cadd-11eb-9ad4-d606b848509d.png)

Since I could not find the corresponding repository for the metadata agent, I would like to know if any possible issue regarding the CPU load issue was raised and possible resolutions. Owing to the lack of a concrete root cause, I'm concerned about it might happen once again. Or if my report should be created on the specific repository for the issue, please let me know.

Thanks for your consideration!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sudden high cpu usage from stackdriver-metadata-agent-cluster-level pod #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sudden high cpu usage from stackdriver-metadata-agent-cluster-level pod #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions