[enhancement] Optimization of Metric Calculation in trace-etl-server​

Recently, we have further optimized the metric calculation in ​​trace-etl-server​​. The background for this optimization stems from our observation that in today's microservices architecture, there are often applications with extremely high call volumes and numerous upstream/downstream dependencies—we refer to these as **​​"head applications"​​**.

Due to their high traffic, instances of these head applications typically rank among the ​​top consumers​​ in the entire cluster. Additionally, because of their extensive dependencies, the ​​dispersion of metric labels​​ (e.g., Http or Dubbo methodName) can be ​​orders of magnitude higher​​ compared to other applications.

These two factors lead to ​​trace-etl-server generating an enormous volume of metrics​​ when processing head applications. This excessive metric load can cause:

​​**Memory exhaustion**​​ in trace-etl-server itself.
​​**Significant delays**​​ when Prometheus scrapes these metrics.
**​​High pressure**​​ on Prometheus during metric aggregation.

To address these issues, we have implemented the following improvements:

​​1. Configurable metric merging per application.​​
2. ​​A recommendation to isolate head applications​​ by consuming their traces in dedicated ​​trace-etl-server instances​​ in production environments.

This optimization ensures better stability and performance for both ​​trace-etl-server​​ and ​​Prometheus​​ when handling high-traffic services.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[enhancement] Optimization of Metric Calculation in trace-etl-server #586

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[enhancement] Optimization of Metric Calculation in trace-etl-server​ #586

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[enhancement] Optimization of Metric Calculation in trace-etl-server #586