-
Notifications
You must be signed in to change notification settings - Fork 41
Description
Recently, we have further optimized the metric calculation in trace-etl-server. The background for this optimization stems from our observation that in today's microservices architecture, there are often applications with extremely high call volumes and numerous upstream/downstream dependencies—we refer to these as "head applications".
Due to their high traffic, instances of these head applications typically rank among the top consumers in the entire cluster. Additionally, because of their extensive dependencies, the dispersion of metric labels (e.g., Http or Dubbo methodName) can be orders of magnitude higher compared to other applications.
These two factors lead to trace-etl-server generating an enormous volume of metrics when processing head applications. This excessive metric load can cause:
Memory exhaustion in trace-etl-server itself.
Significant delays when Prometheus scrapes these metrics.
High pressure on Prometheus during metric aggregation.
To address these issues, we have implemented the following improvements:
1. Configurable metric merging per application.
2. A recommendation to isolate head applications by consuming their traces in dedicated trace-etl-server instances in production environments.
This optimization ensures better stability and performance for both trace-etl-server and Prometheus when handling high-traffic services.