-
Notifications
You must be signed in to change notification settings - Fork 53
fix: split heavy union pql #372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: split heavy union pql #372
Conversation
Reviewer's GuideIntroduces a new QueryRangeWithP9xBuilder path that splits large union PromQL percentile queries into smaller grouped queries and merges the matrix results, and refactors existing P9x SQL helpers and tests to work with a reusable UnionP9xBuilder instead of raw strings. Sequence diagram for QueryRangeWithP9xBuilder percentile query flowsequenceDiagram
actor Service
participant Repo as promRepo
participant Builder as UnionP9xBuilder
participant Prom as PrometheusAPI
Service->>Repo: QueryDbRangePercentile(ctx, startTime, endTime, step, svcs, endpoints, systems)
Repo->>Repo: tRange = v1.Range{Start, End, Step}
Repo->>Repo: qb = getDbP9xSql(promRange, tRange.Step, svcs, endpoints, systems)
Repo->>Prom: QueryRangeWithP9xBuilder(ctx, qb, tRange)
activate Repo
alt builder.count <= MAX_CONDITIONS_PER_GROUP
Repo->>Builder: ToString()
Builder-->>Repo: fullPql
Repo->>Prom: QueryRange(ctx, fullPql, tRange)
Prom-->>Repo: value, warnings, error
else builder.count > MAX_CONDITIONS_PER_GROUP
loop for i in range(0, builder.count, MAX_CONDITIONS_PER_GROUP)
Repo->>Builder: buildQueryWithCondRange(i, end)
Builder-->>Repo: partialPql
Repo->>Prom: QueryRange(ctx, partialPql, tRange)
Prom-->>Repo: subValue, subWarnings, subError
Repo->>Repo: collect warnings and errors
Repo->>Repo: assert subValue is Matrix and append to res
end
Repo->>Repo: merge res into Matrix
Repo-->>Repo: return Matrix, warnings, errors.Join(errs)
end
deactivate Repo
Repo-->>Service: getDescendantMetrics(..., tRange, Matrix)
Class diagram for promRepo, Repo, and UnionP9xBuilder changesclassDiagram
class Repo {
<<interface>>
QueryDbRangePercentile(ctx, startTime, endTime, step, svcs, endpoints, systems)
QueryExternalRangePercentile(ctx, startTime, endTime, step, svcs, endpoints, systems)
QueryMqRangePercentile(ctx, startTime, endTime, step, svcs, endpoints, systems)
QueryRangePercentile(ctx, startTime, endTime, step, svcs, endpoints)
QueryInstanceP90(ctx, startTime, endTime, step, endpoint, instance)
QueryRangeWithP9xBuilder(ctx, builder, tRange) pmodel.Value, v1.Warnings, error
}
class promRepo {
promRange string
GetApi()
GetRange()
QueryDbRangePercentile(ctx, startTime, endTime, step, svcs, endpoints, systems)
QueryExternalRangePercentile(ctx, startTime, endTime, step, svcs, endpoints, systems)
QueryMqRangePercentile(ctx, startTime, endTime, step, svcs, endpoints, systems)
QueryRangePercentile(ctx, startTime, endTime, step, svcs, endpoints)
QueryInstanceP90(ctx, startTime, endTime, step, endpoint, instance)
QueryRangeWithP9xBuilder(ctx, builder, tRange) pmodel.Value, v1.Warnings, error
}
Repo <|.. promRepo
class UnionP9xBuilder {
value string
tableName string
labels []string
duration time.Duration
count int
conditions []P9xCondition
extraConditions []string
ToString() string
buildQueryWithCondRange(start, end) string
AddCondition(key, values)
AddExtraCondition(condition)
}
class P9xCondition {
Key string
Values []string
}
class Helpers {
getSpanTraceP9xSql(promRange, step, svcs, endpoints) *UnionP9xBuilder
getSpanTraceInstanceP9xSql(promRange, step, endpoint, extraCondition) *UnionP9xBuilder
getDbP9xSql(promRange, step, svcs, endpoints, systems) *UnionP9xBuilder
getExternalP9xSql(promRange, step, svcs, endpoints, systems) *UnionP9xBuilder
getMqP9xSql(promRange, step, svcs, endpoints, systems) *UnionP9xBuilder
}
promRepo --> UnionP9xBuilder : uses
Helpers --> UnionP9xBuilder : constructs
UnionP9xBuilder --> P9xCondition : aggregates
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey - I've found 4 issues, and left some high level feedback:
- In TestSpliteSpanTraceP9x, the second and third t.Errorf calls use e1/res1 in the error message instead of e2/res2 and e3/res3, which will make failures misleading; update them to report the correct expected/actual values.
- MAX_CONDITIONS_PER_GROUP is a configuration-like limit that never changes at runtime; consider making it a lowercase const (e.g., maxConditionsPerGroup) to better reflect its usage and keep it unexported.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In TestSpliteSpanTraceP9x, the second and third t.Errorf calls use e1/res1 in the error message instead of e2/res2 and e3/res3, which will make failures misleading; update them to report the correct expected/actual values.
- MAX_CONDITIONS_PER_GROUP is a configuration-like limit that never changes at runtime; consider making it a lowercase const (e.g., maxConditionsPerGroup) to better reflect its usage and keep it unexported.
## Individual Comments
### Comment 1
<location> `backend/pkg/repository/prometheus/p9x_builder_test.go:28` </location>
<code_context>
+ }
+}
+
+func TestSpliteSpanTraceP9x(t *testing.T) {
+ svcs := []string{"ts-station-service", "ts-travel2-service", "ts-travel-service"}
+ endpoints := []string{
</code_context>
<issue_to_address>
**nitpick (typo):** Typo in test name may reduce discoverability and clarity
The function name `TestSpliteSpanTraceP9x` appears to be misspelled. Please rename it (e.g., `TestSplitSpanTraceP9x` or another clearer variant) so the test’s purpose is obvious and easy to find in tooling.
```suggestion
func TestSplitSpanTraceP9x(t *testing.T) {
```
</issue_to_address>
### Comment 2
<location> `backend/pkg/repository/prometheus/p9x_builder_test.go:52-60` </location>
<code_context>
+ if e1 != res1 {
+ t.Errorf("want=%s, got=%s", e1, res1)
+ }
+ if e2 != res2 {
+ t.Errorf("want=%s, got=%s", e1, res1)
+ }
+ if e3 != res3 {
+ t.Errorf("want=%s, got=%s", e1, res1)
}
</code_context>
<issue_to_address>
**suggestion (testing):** Failure messages in e2/e3 assertions print the wrong expected/actual values
The comparisons use `e2 != res2` and `e3 != res3`, but the error messages still log `e1`/`res1`, which will mislead debugging. Please update them to:
```go
if e2 != res2 {
t.Errorf("want=%s, got=%s", e2, res2)
}
if e3 != res3 {
t.Errorf("want=%s, got=%s", e3, res3)
}
```
```suggestion
if e1 != res1 {
t.Errorf("want=%s, got=%s", e1, res1)
}
if e2 != res2 {
t.Errorf("want=%s, got=%s", e2, res2)
}
if e3 != res3 {
t.Errorf("want=%s, got=%s", e3, res3)
}
```
</issue_to_address>
### Comment 3
<location> `backend/pkg/repository/prometheus/p9x_builder_test.go:12` </location>
<code_context>
-}
-
-func testSpanTraceP9x(t *testing.T) {
+func TestSpanTraceP9x(t *testing.T) {
svcs := []string{"ts-station-service", "ts-travel2-service"}
endpoints := []string{
</code_context>
<issue_to_address>
**suggestion (testing):** Missing tests for QueryRangeWithP9xBuilder splitting logic, result aggregation, and error paths
Given the amount of new behavior in `QueryRangeWithP9xBuilder`, it would be good to add focused tests that:
1. Exercise splitting:
- Use a `UnionP9xBuilder` with > `MAX_CONDITIONS_PER_GROUP` values and a fake `prometheus v1.API`.
- Assert the API is called the expected number of times with correctly chunked sub-queries.
2. Exercise aggregation:
- Have the fake API return different `prometheus_model.Matrix` values per sub-call and assert the combined result contains all series in the expected order.
- Return non-nil `Warnings` from multiple sub-calls and assert they are all present in the final `Warnings` slice.
3. Exercise error handling:
- Mix successful and failing sub-calls, and assert the final error is an `errors.Join` of the individual errors while still aggregating successful results.
- Include a case where one sub-call returns a non-matrix `pmodel.Value` and assert the function returns the expected `core.Error` with a clear message.
These will directly validate the new split-query behavior and ensure the heavy-union path is properly covered by tests.
Suggested implementation:
```golang
func TestSpanTraceP9x(t *testing.T) {
svcs := []string{"ts-station-service", "ts-travel2-service"}
endpoints := []string{
"POST /api/v1/stationservice/stations/idlist",
"histogram_quantile(0.9, sum by(vmrange, content_key, svc_name) (increase(kindling_span_trace_duration_nanoseconds_bucket{svc_name='ts-station-service', content_key='POST /api/v1/stationservice/stations/idlist'}[5m])))," +
"histogram_quantile(0.9, sum by(vmrange, content_key, svc_name) (increase(kindling_span_trace_duration_nanoseconds_bucket{svc_name='ts-travel2-service', content_key='POST /api/v1/travel2service/trips/left'}[5m])))" +
")"
if expect != got.ToString() {
t.Errorf("want=%s, got=%s", expect, got.ToString())
}
}
// fakePrometheusAPI is a minimal fake implementing the subset of v1.API
// needed by QueryRangeWithP9xBuilder. All non-QueryRange methods panic if used.
type fakePrometheusAPI struct {
t *testing.T
// scripted results for successive QueryRange calls
results []struct {
value pmodel.Value
warnings v1.Warnings
err error
}
// captured calls
queries []string
ranges []v1.Range
call int
}
func newFakePrometheusAPI(t *testing.T, results []struct {
value pmodel.Value
warnings v1.Warnings
err error
}) *fakePrometheusAPI {
return &fakePrometheusAPI{
t: t,
results: results,
}
}
func (f *fakePrometheusAPI) QueryRange(ctx context.Context, query string, r v1.Range, opts ...v1.Option) (pmodel.Value, v1.Warnings, error) {
f.queries = append(f.queries, query)
f.ranges = append(f.ranges, r)
if f.call >= len(f.results) {
f.t.Fatalf("unexpected QueryRange call %d (only %d scripted results)", f.call+1, len(f.results))
}
res := f.results[f.call]
f.call++
return res.value, res.warnings, res.err
}
// The remaining v1.API methods are not used by QueryRangeWithP9xBuilder in these tests.
// They panic if accidentally invoked to make misuse obvious.
func (*fakePrometheusAPI) Alerts(ctx context.Context) (v1.AlertsResult, error) {
panic("fakePrometheusAPI.Alerts not implemented in tests")
}
func (*fakePrometheusAPI) AlertManagers(ctx context.Context) (v1.AlertManagersResult, error) {
panic("fakePrometheusAPI.AlertManagers not implemented in tests")
}
func (*fakePrometheusAPI) CleanTombstones(ctx context.Context) error {
panic("fakePrometheusAPI.CleanTombstones not implemented in tests")
}
func (*fakePrometheusAPI) Config(ctx context.Context) (v1.ConfigResult, error) {
panic("fakePrometheusAPI.Config not implemented in tests")
}
func (*fakePrometheusAPI) DeleteSeries(ctx context.Context, matches []string, startTime, endTime time.Time) error {
panic("fakePrometheusAPI.DeleteSeries not implemented in tests")
}
func (*fakePrometheusAPI) Flags(ctx context.Context) (v1.FlagsResult, error) {
panic("fakePrometheusAPI.Flags not implemented in tests")
}
func (*fakePrometheusAPI) LabelNames(ctx context.Context, matches []string, startTime, endTime time.Time) ([]string, v1.Warnings, error) {
panic("fakePrometheusAPI.LabelNames not implemented in tests")
}
func (*fakePrometheusAPI) LabelValues(ctx context.Context, label string, matches []string, startTime, endTime time.Time) (pmodel.LabelValues, v1.Warnings, error) {
panic("fakePrometheusAPI.LabelValues not implemented in tests")
}
func (*fakePrometheusAPI) Query(ctx context.Context, query string, ts time.Time, opts ...v1.Option) (pmodel.Value, v1.Warnings, error) {
panic("fakePrometheusAPI.Query not implemented in tests")
}
func (*fakePrometheusAPI) Runtimeinfo(ctx context.Context) (v1.RuntimeinfoResult, error) {
panic("fakePrometheusAPI.Runtimeinfo not implemented in tests")
}
func (*fakePrometheusAPI) Series(ctx context.Context, matches []string, startTime, endTime time.Time) ([]pmodel.LabelSet, v1.Warnings, error) {
panic("fakePrometheusAPI.Series not implemented in tests")
}
func (*fakePrometheusAPI) Snapshot(ctx context.Context, skipHead bool) (v1.SnapshotResult, error) {
panic("fakePrometheusAPI.Snapshot not implemented in tests")
}
func (*fakePrometheusAPI) Rules(ctx context.Context) (v1.RulesResult, error) {
panic("fakePrometheusAPI.Rules not implemented in tests")
}
func (*fakePrometheusAPI) Targets(ctx context.Context) (v1.TargetsResult, error) {
panic("fakePrometheusAPI.Targets not implemented in tests")
}
func (*fakePrometheusAPI) TargetsMetadata(ctx context.Context, matchTarget, metric, limit string) ([]v1.MetricMetadata, error) {
panic("fakePrometheusAPI.TargetsMetadata not implemented in tests")
}
func (*fakePrometheusAPI) TSDB(ctx context.Context) (v1.TSDBResult, error) {
panic("fakePrometheusAPI.TSDB not implemented in tests")
}
func (*fakePrometheusAPI) Metadata(ctx context.Context, metric, limit string) (map[string][]v1.Metadata, error) {
panic("fakePrometheusAPI.Metadata not implemented in tests")
}
func (*fakePrometheusAPI) QueryExemplars(ctx context.Context, query string, startTime, endTime time.Time) ([]v1.ExemplarQueryResult, error) {
panic("fakePrometheusAPI.QueryExemplars not implemented in tests")
}
// buildUnionP9xWithValues is a small helper to create a UnionP9xBuilder with the
// given list of values, tailored for exercising split behavior.
func buildUnionP9xWithValues(values []string) UnionP9xBuilder {
var ub UnionP9xBuilder
for _, v := range values {
ub = ub.Or(Equals("svc_name", v))
}
return ub
}
// buildMatrix creates a simple pmodel.Matrix with a single time series with the
// given label set and one sample at the given value and timestamp.
func buildMatrix(lbls pmodel.LabelSet, ts time.Time, val float64) pmodel.Matrix {
m := pmodel.Matrix{}
series := &pmodel.SampleStream{
Metric: lbls,
Values: []pmodel.SamplePair{
{
Timestamp: pmodel.TimeFromUnixNano(ts.UnixNano()),
Value: pmodel.SampleValue(val),
},
},
}
m = append(m, series)
return m
}
// TestQueryRangeWithP9xBuilder_Splitting verifies that when a UnionP9xBuilder
// has more than MAX_CONDITIONS_PER_GROUP values, QueryRangeWithP9xBuilder
// splits the query into multiple sub-queries.
func TestQueryRangeWithP9xBuilder_Splitting(t *testing.T) {
t.Parallel()
// Arrange: create more values than MAX_CONDITIONS_PER_GROUP to force splitting.
totalValues := MAX_CONDITIONS_PER_GROUP + 3
values := make([]string, 0, totalValues)
for i := 0; i < totalValues; i++ {
values = append(values, fmt.Sprintf("svc-%d", i))
}
union := buildUnionP9xWithValues(values)
queryRange := v1.Range{
Start: time.Now().Add(-5 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
// Script responses: one empty matrix per expected sub-call.
numSubCalls := (totalValues + MAX_CONDITIONS_PER_GROUP - 1) / MAX_CONDITIONS_PER_GROUP
results := make([]struct {
value pmodel.Value
warnings v1.Warnings
err error
}, numSubCalls)
for i := range results {
results[i].value = pmodel.Matrix{}
}
api := newFakePrometheusAPI(t, results)
// Act
ctx := context.Background()
_, warnings, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
// Assert
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(warnings) != 0 {
t.Fatalf("expected no warnings, got %d", len(warnings))
}
if got, want := len(api.queries), numSubCalls; got != want {
t.Fatalf("expected %d QueryRange calls, got %d", want, got)
}
// Ensure that each sub-query contains only a subset of the values and that
// all values are present across all sub-queries.
seen := map[string]bool{}
for _, q := range api.queries {
countInQuery := 0
for _, v := range values {
if strings.Contains(q, v) {
seen[v] = true
countInQuery++
}
}
if countInQuery == 0 {
t.Fatalf("sub-query %q does not contain any expected svc_name conditions", q)
}
}
for _, v := range values {
if !seen[v] {
t.Fatalf("value %q not present in any generated sub-query", v)
}
}
}
// TestQueryRangeWithP9xBuilder_Aggregation verifies that results from multiple
// sub-calls are aggregated correctly and that warnings are merged.
func TestQueryRangeWithP9xBuilder_Aggregation(t *testing.T) {
t.Parallel()
union := buildUnionP9xWithValues([]string{"svc-a", "svc-b", "svc-c"})
queryRange := v1.Range{
Start: time.Now().Add(-10 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
now := time.Now()
m1 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-a"}, now, 1)
m2 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-b"}, now, 2)
m3 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-c"}, now, 3)
results := []struct {
value pmodel.Value
warnings v1.Warnings
err error
}{
{value: m1, warnings: v1.Warnings{"w1"}, err: nil},
{value: m2, warnings: v1.Warnings{"w2"}, err: nil},
{value: m3, warnings: v1.Warnings{"w3"}, err: nil},
}
api := newFakePrometheusAPI(t, results)
ctx := context.Background()
value, warnings, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
matrix, ok := value.(pmodel.Matrix)
if !ok {
t.Fatalf("expected Matrix result, got %T", value)
}
if got, want := len(matrix), 3; got != want {
t.Fatalf("expected %d time series, got %d", want, got)
}
// Verify that the series are present in the order in which they were returned.
if string(matrix[0].Metric["svc_name"]) != "svc-a" ||
string(matrix[1].Metric["svc_name"]) != "svc-b" ||
string(matrix[2].Metric["svc_name"]) != "svc-c" {
t.Fatalf("unexpected series order: got %v, %v, %v",
matrix[0].Metric["svc_name"],
matrix[1].Metric["svc_name"],
matrix[2].Metric["svc_name"])
}
if got, want := len(warnings), 3; got != want {
t.Fatalf("expected %d combined warnings, got %d (%v)", want, got, warnings)
}
}
// TestQueryRangeWithP9xBuilder_ErrorHandling verifies that mixed success and
// failure sub-calls result in an errors.Join error while still aggregating
// successful results.
func TestQueryRangeWithP9xBuilder_ErrorHandling(t *testing.T) {
t.Parallel()
union := buildUnionP9xWithValues([]string{"svc-ok-1", "svc-fail", "svc-ok-2"})
queryRange := v1.Range{
Start: time.Now().Add(-15 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
now := time.Now()
okMatrix1 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-ok-1"}, now, 10)
okMatrix2 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-ok-2"}, now, 20)
err1 := fmt.Errorf("first failure")
err2 := fmt.Errorf("second failure")
results := []struct {
value pmodel.Value
warnings v1.Warnings
err error
}{
{value: okMatrix1, warnings: nil, err: nil},
{value: nil, warnings: nil, err: err1},
{value: okMatrix2, warnings: nil, err: err2},
}
api := newFakePrometheusAPI(t, results)
ctx := context.Background()
value, warnings, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
if err == nil {
t.Fatalf("expected error, got nil")
}
// Errors are expected to be joined.
unwrapped := errors.Unwrap(err)
if unwrapped == nil {
// If errors.Join is used, Unwrap should be non-nil; if not, we at least
// assert the message contains both underlying messages.
msg := err.Error()
if !strings.Contains(msg, err1.Error()) || !strings.Contains(msg, err2.Error()) {
t.Fatalf("expected joined error to contain both messages, got %q", msg)
}
} else {
// Best-effort: check that the top-level error string contains both messages.
msg := err.Error()
if !strings.Contains(msg, err1.Error()) || !strings.Contains(msg, err2.Error()) {
t.Fatalf("expected joined error to contain both messages, got %q", msg)
}
}
if len(warnings) != 0 {
t.Fatalf("expected no warnings, got %d", len(warnings))
}
// Successful matrices should still be aggregated.
matrix, ok := value.(pmodel.Matrix)
if !ok {
t.Fatalf("expected Matrix result, got %T", value)
}
if got, want := len(matrix), 2; got != want {
t.Fatalf("expected %d successful time series, got %d", want, got)
}
}
// TestQueryRangeWithP9xBuilder_NonMatrixResult verifies that a non-matrix
// pmodel.Value returned from a sub-call results in a core.Error with a clear
// message.
func TestQueryRangeWithP9xBuilder_NonMatrixResult(t *testing.T) {
t.Parallel()
union := buildUnionP9xWithValues([]string{"svc-a"})
queryRange := v1.Range{
Start: time.Now().Add(-5 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
// Return a Vector instead of Matrix to simulate misuse.
vector := pmodel.Vector{
&pmodel.Sample{
Metric: pmodel.Metric{"svc_name": "svc-a"},
Value: 42,
},
}
results := []struct {
value pmodel.Value
warnings v1.Warnings
err error
}{
{value: vector, warnings: nil, err: nil},
}
api := newFakePrometheusAPI(t, results)
ctx := context.Background()
_, _, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
if err == nil {
t.Fatalf("expected error, got nil")
}
var coreErr core.Error
if !errors.As(err, &coreErr) {
t.Fatalf("expected error of type core.Error, got %T", err)
}
msg := coreErr.Error()
if !strings.Contains(strings.ToLower(msg), "matrix") {
t.Fatalf("expected core.Error message to mention matrix, got %q", msg)
}
}
```
1. **Imports:** The new tests rely on additional imports. In the import block of `backend/pkg/repository/prometheus/p9x_builder_test.go`, ensure you add (or adjust aliases to match your codebase):
- `context`
- `errors`
- `fmt`
- `strings`
- `time`
- `github.com/prometheus/common/model` imported as `pmodel` (or whatever alias your existing code uses)
- `github.com/prometheus/client_golang/api/prometheus/v1` imported as `v1`
- The correct package path for your `core.Error` type, for example `github.com/<your-org>/<your-repo>/pkg/core` imported as `core` (update the path to match your project).
2. **UnionP9xBuilder / Equals helpers:**
- The helper `buildUnionP9xWithValues` assumes:
- There is a type `UnionP9xBuilder` in this package.
- It has a method `Or(...) UnionP9xBuilder`.
- There is a function `Equals(field, value string) P9xCondition` (or similar) compatible with `UnionP9xBuilder.Or`.
- If your actual builder API differs, update `buildUnionP9xWithValues` and the call sites accordingly so that it produces a `UnionP9xBuilder` with one condition per `svc_name` value.
3. **MAX_CONDITIONS_PER_GROUP:**
- The tests assume a `const MAX_CONDITIONS_PER_GROUP` available in this package (or exported from the production code). If the constant is not exported, either:
- Move the tests into the same package (non-`_test` package name), or
- Export the constant (e.g. `MaxConditionsPerGroup`) and update the tests.
- If the name is different in your code, adjust references in the tests.
4. **QueryRangeWithP9xBuilder signature:**
- The tests assume a function:
```go
func QueryRangeWithP9xBuilder(ctx context.Context, api v1.API, builder UnionP9xBuilder, r v1.Range) (pmodel.Value, v1.Warnings, error)
```
- If the actual signature differs (e.g., additional parameters, different builder type, or different return types), update the test calls appropriately.
5. **core.Error behavior:**
- The last test checks for `errors.As(err, &coreErr)` and that the error message mentions "matrix". Ensure your implementation of the non-matrix error path uses `core.Error` and includes a clear description (e.g., "expected matrix result, got %T").
6. **API interface version:**
- The stub methods in `fakePrometheusAPI` are based on the Prometheus `v1.API` interface as of recent versions. If your version differs (extra/removed methods), update the stubbed method set so that `fakePrometheusAPI` satisfies the interface used by `QueryRangeWithP9xBuilder`.
</issue_to_address>
### Comment 4
<location> `backend/pkg/repository/prometheus/p9x_builder.go:165` </location>
<code_context>
return builder.String()
}
+func (p9x *UnionP9xBuilder) buildQueryWithCondRange(start, end int) string {
+ var builder strings.Builder
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting the shared PromQL string-building logic into small helper methods and reusing them in both ToString and buildQueryWithCondRange to avoid duplicated, index-heavy code.
You can reduce the added complexity and duplication by extracting the shared query‑construction logic into small helpers, and then using those from both `ToString` and `buildQueryWithCondRange`. This keeps the new grouping behavior while avoiding two large, nearly identical builders.
For example:
```go
func (p9x *UnionP9xBuilder) writeLabels(builder *strings.Builder) {
for i, label := range p9x.labels {
if i > 0 {
builder.WriteString(", ")
}
builder.WriteString(label)
}
}
func (p9x *UnionP9xBuilder) writeSelector(builder *strings.Builder, valueIndex int) {
builder.WriteString(p9x.tableName)
builder.WriteString("{")
for i, condition := range p9x.conditions {
if i > 0 {
builder.WriteString(", ")
}
builder.WriteString(condition.Key)
builder.WriteString("='")
builder.WriteString(condition.Values[valueIndex])
builder.WriteString("'")
}
for i, extraCondition := range p9x.extraConditions {
if len(p9x.conditions) > 0 || i > 0 {
builder.WriteString(", ")
}
builder.WriteString(extraCondition)
}
builder.WriteString("}[")
builder.WriteString(getDurationFromStep(p9x.duration))
builder.WriteString("])))")
}
func (p9x *UnionP9xBuilder) writeHistogramQuantile(builder *strings.Builder, valueIndex int) {
builder.WriteString("histogram_quantile(")
builder.WriteString(p9x.value)
builder.WriteString(", sum by(")
p9x.writeLabels(builder)
builder.WriteString(") (increase(")
p9x.writeSelector(builder, valueIndex)
}
```
Then `ToString` and `buildQueryWithCondRange` become simple loops over indices:
```go
func (p9x *UnionP9xBuilder) ToString() string {
var b strings.Builder
if p9x.count > 1 {
b.WriteString("union(")
}
for i := 0; i < p9x.count; i++ {
if i > 0 {
b.WriteString(",")
}
p9x.writeHistogramQuantile(&b, i)
}
if p9x.count > 1 {
b.WriteString(")")
}
return b.String()
}
func (p9x *UnionP9xBuilder) buildQueryWithCondRange(start, end int) string {
var b strings.Builder
if end-start > 1 {
b.WriteString("union(")
}
for idx := start; idx < end; idx++ {
if idx > start {
b.WriteString(",")
}
p9x.writeHistogramQuantile(&b, idx)
}
if end-start > 1 {
b.WriteString(")")
}
return b.String()
}
```
This removes the nested, index‑heavy duplication, keeps all existing behavior (including extraConditions), and centralizes the PromQL formatting into small, reusable helpers. It also makes `buildQueryWithCondRange` less coupled to the structure of `ToString`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| } | ||
|
|
||
| func testSpanTraceP9x(t *testing.T) { | ||
| func TestSpanTraceP9x(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (testing): Missing tests for QueryRangeWithP9xBuilder splitting logic, result aggregation, and error paths
Given the amount of new behavior in QueryRangeWithP9xBuilder, it would be good to add focused tests that:
-
Exercise splitting:
- Use a
UnionP9xBuilderwith >MAX_CONDITIONS_PER_GROUPvalues and a fakeprometheus v1.API. - Assert the API is called the expected number of times with correctly chunked sub-queries.
- Use a
-
Exercise aggregation:
- Have the fake API return different
prometheus_model.Matrixvalues per sub-call and assert the combined result contains all series in the expected order. - Return non-nil
Warningsfrom multiple sub-calls and assert they are all present in the finalWarningsslice.
- Have the fake API return different
-
Exercise error handling:
- Mix successful and failing sub-calls, and assert the final error is an
errors.Joinof the individual errors while still aggregating successful results. - Include a case where one sub-call returns a non-matrix
pmodel.Valueand assert the function returns the expectedcore.Errorwith a clear message.
- Mix successful and failing sub-calls, and assert the final error is an
These will directly validate the new split-query behavior and ensure the heavy-union path is properly covered by tests.
Suggested implementation:
func TestSpanTraceP9x(t *testing.T) {
svcs := []string{"ts-station-service", "ts-travel2-service"}
endpoints := []string{
"POST /api/v1/stationservice/stations/idlist",
"histogram_quantile(0.9, sum by(vmrange, content_key, svc_name) (increase(kindling_span_trace_duration_nanoseconds_bucket{svc_name='ts-station-service', content_key='POST /api/v1/stationservice/stations/idlist'}[5m])))," +
"histogram_quantile(0.9, sum by(vmrange, content_key, svc_name) (increase(kindling_span_trace_duration_nanoseconds_bucket{svc_name='ts-travel2-service', content_key='POST /api/v1/travel2service/trips/left'}[5m])))" +
")"
if expect != got.ToString() {
t.Errorf("want=%s, got=%s", expect, got.ToString())
}
}
// fakePrometheusAPI is a minimal fake implementing the subset of v1.API
// needed by QueryRangeWithP9xBuilder. All non-QueryRange methods panic if used.
type fakePrometheusAPI struct {
t *testing.T
// scripted results for successive QueryRange calls
results []struct {
value pmodel.Value
warnings v1.Warnings
err error
}
// captured calls
queries []string
ranges []v1.Range
call int
}
func newFakePrometheusAPI(t *testing.T, results []struct {
value pmodel.Value
warnings v1.Warnings
err error
}) *fakePrometheusAPI {
return &fakePrometheusAPI{
t: t,
results: results,
}
}
func (f *fakePrometheusAPI) QueryRange(ctx context.Context, query string, r v1.Range, opts ...v1.Option) (pmodel.Value, v1.Warnings, error) {
f.queries = append(f.queries, query)
f.ranges = append(f.ranges, r)
if f.call >= len(f.results) {
f.t.Fatalf("unexpected QueryRange call %d (only %d scripted results)", f.call+1, len(f.results))
}
res := f.results[f.call]
f.call++
return res.value, res.warnings, res.err
}
// The remaining v1.API methods are not used by QueryRangeWithP9xBuilder in these tests.
// They panic if accidentally invoked to make misuse obvious.
func (*fakePrometheusAPI) Alerts(ctx context.Context) (v1.AlertsResult, error) {
panic("fakePrometheusAPI.Alerts not implemented in tests")
}
func (*fakePrometheusAPI) AlertManagers(ctx context.Context) (v1.AlertManagersResult, error) {
panic("fakePrometheusAPI.AlertManagers not implemented in tests")
}
func (*fakePrometheusAPI) CleanTombstones(ctx context.Context) error {
panic("fakePrometheusAPI.CleanTombstones not implemented in tests")
}
func (*fakePrometheusAPI) Config(ctx context.Context) (v1.ConfigResult, error) {
panic("fakePrometheusAPI.Config not implemented in tests")
}
func (*fakePrometheusAPI) DeleteSeries(ctx context.Context, matches []string, startTime, endTime time.Time) error {
panic("fakePrometheusAPI.DeleteSeries not implemented in tests")
}
func (*fakePrometheusAPI) Flags(ctx context.Context) (v1.FlagsResult, error) {
panic("fakePrometheusAPI.Flags not implemented in tests")
}
func (*fakePrometheusAPI) LabelNames(ctx context.Context, matches []string, startTime, endTime time.Time) ([]string, v1.Warnings, error) {
panic("fakePrometheusAPI.LabelNames not implemented in tests")
}
func (*fakePrometheusAPI) LabelValues(ctx context.Context, label string, matches []string, startTime, endTime time.Time) (pmodel.LabelValues, v1.Warnings, error) {
panic("fakePrometheusAPI.LabelValues not implemented in tests")
}
func (*fakePrometheusAPI) Query(ctx context.Context, query string, ts time.Time, opts ...v1.Option) (pmodel.Value, v1.Warnings, error) {
panic("fakePrometheusAPI.Query not implemented in tests")
}
func (*fakePrometheusAPI) Runtimeinfo(ctx context.Context) (v1.RuntimeinfoResult, error) {
panic("fakePrometheusAPI.Runtimeinfo not implemented in tests")
}
func (*fakePrometheusAPI) Series(ctx context.Context, matches []string, startTime, endTime time.Time) ([]pmodel.LabelSet, v1.Warnings, error) {
panic("fakePrometheusAPI.Series not implemented in tests")
}
func (*fakePrometheusAPI) Snapshot(ctx context.Context, skipHead bool) (v1.SnapshotResult, error) {
panic("fakePrometheusAPI.Snapshot not implemented in tests")
}
func (*fakePrometheusAPI) Rules(ctx context.Context) (v1.RulesResult, error) {
panic("fakePrometheusAPI.Rules not implemented in tests")
}
func (*fakePrometheusAPI) Targets(ctx context.Context) (v1.TargetsResult, error) {
panic("fakePrometheusAPI.Targets not implemented in tests")
}
func (*fakePrometheusAPI) TargetsMetadata(ctx context.Context, matchTarget, metric, limit string) ([]v1.MetricMetadata, error) {
panic("fakePrometheusAPI.TargetsMetadata not implemented in tests")
}
func (*fakePrometheusAPI) TSDB(ctx context.Context) (v1.TSDBResult, error) {
panic("fakePrometheusAPI.TSDB not implemented in tests")
}
func (*fakePrometheusAPI) Metadata(ctx context.Context, metric, limit string) (map[string][]v1.Metadata, error) {
panic("fakePrometheusAPI.Metadata not implemented in tests")
}
func (*fakePrometheusAPI) QueryExemplars(ctx context.Context, query string, startTime, endTime time.Time) ([]v1.ExemplarQueryResult, error) {
panic("fakePrometheusAPI.QueryExemplars not implemented in tests")
}
// buildUnionP9xWithValues is a small helper to create a UnionP9xBuilder with the
// given list of values, tailored for exercising split behavior.
func buildUnionP9xWithValues(values []string) UnionP9xBuilder {
var ub UnionP9xBuilder
for _, v := range values {
ub = ub.Or(Equals("svc_name", v))
}
return ub
}
// buildMatrix creates a simple pmodel.Matrix with a single time series with the
// given label set and one sample at the given value and timestamp.
func buildMatrix(lbls pmodel.LabelSet, ts time.Time, val float64) pmodel.Matrix {
m := pmodel.Matrix{}
series := &pmodel.SampleStream{
Metric: lbls,
Values: []pmodel.SamplePair{
{
Timestamp: pmodel.TimeFromUnixNano(ts.UnixNano()),
Value: pmodel.SampleValue(val),
},
},
}
m = append(m, series)
return m
}
// TestQueryRangeWithP9xBuilder_Splitting verifies that when a UnionP9xBuilder
// has more than MAX_CONDITIONS_PER_GROUP values, QueryRangeWithP9xBuilder
// splits the query into multiple sub-queries.
func TestQueryRangeWithP9xBuilder_Splitting(t *testing.T) {
t.Parallel()
// Arrange: create more values than MAX_CONDITIONS_PER_GROUP to force splitting.
totalValues := MAX_CONDITIONS_PER_GROUP + 3
values := make([]string, 0, totalValues)
for i := 0; i < totalValues; i++ {
values = append(values, fmt.Sprintf("svc-%d", i))
}
union := buildUnionP9xWithValues(values)
queryRange := v1.Range{
Start: time.Now().Add(-5 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
// Script responses: one empty matrix per expected sub-call.
numSubCalls := (totalValues + MAX_CONDITIONS_PER_GROUP - 1) / MAX_CONDITIONS_PER_GROUP
results := make([]struct {
value pmodel.Value
warnings v1.Warnings
err error
}, numSubCalls)
for i := range results {
results[i].value = pmodel.Matrix{}
}
api := newFakePrometheusAPI(t, results)
// Act
ctx := context.Background()
_, warnings, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
// Assert
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
if len(warnings) != 0 {
t.Fatalf("expected no warnings, got %d", len(warnings))
}
if got, want := len(api.queries), numSubCalls; got != want {
t.Fatalf("expected %d QueryRange calls, got %d", want, got)
}
// Ensure that each sub-query contains only a subset of the values and that
// all values are present across all sub-queries.
seen := map[string]bool{}
for _, q := range api.queries {
countInQuery := 0
for _, v := range values {
if strings.Contains(q, v) {
seen[v] = true
countInQuery++
}
}
if countInQuery == 0 {
t.Fatalf("sub-query %q does not contain any expected svc_name conditions", q)
}
}
for _, v := range values {
if !seen[v] {
t.Fatalf("value %q not present in any generated sub-query", v)
}
}
}
// TestQueryRangeWithP9xBuilder_Aggregation verifies that results from multiple
// sub-calls are aggregated correctly and that warnings are merged.
func TestQueryRangeWithP9xBuilder_Aggregation(t *testing.T) {
t.Parallel()
union := buildUnionP9xWithValues([]string{"svc-a", "svc-b", "svc-c"})
queryRange := v1.Range{
Start: time.Now().Add(-10 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
now := time.Now()
m1 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-a"}, now, 1)
m2 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-b"}, now, 2)
m3 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-c"}, now, 3)
results := []struct {
value pmodel.Value
warnings v1.Warnings
err error
}{
{value: m1, warnings: v1.Warnings{"w1"}, err: nil},
{value: m2, warnings: v1.Warnings{"w2"}, err: nil},
{value: m3, warnings: v1.Warnings{"w3"}, err: nil},
}
api := newFakePrometheusAPI(t, results)
ctx := context.Background()
value, warnings, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
if err != nil {
t.Fatalf("unexpected error: %v", err)
}
matrix, ok := value.(pmodel.Matrix)
if !ok {
t.Fatalf("expected Matrix result, got %T", value)
}
if got, want := len(matrix), 3; got != want {
t.Fatalf("expected %d time series, got %d", want, got)
}
// Verify that the series are present in the order in which they were returned.
if string(matrix[0].Metric["svc_name"]) != "svc-a" ||
string(matrix[1].Metric["svc_name"]) != "svc-b" ||
string(matrix[2].Metric["svc_name"]) != "svc-c" {
t.Fatalf("unexpected series order: got %v, %v, %v",
matrix[0].Metric["svc_name"],
matrix[1].Metric["svc_name"],
matrix[2].Metric["svc_name"])
}
if got, want := len(warnings), 3; got != want {
t.Fatalf("expected %d combined warnings, got %d (%v)", want, got, warnings)
}
}
// TestQueryRangeWithP9xBuilder_ErrorHandling verifies that mixed success and
// failure sub-calls result in an errors.Join error while still aggregating
// successful results.
func TestQueryRangeWithP9xBuilder_ErrorHandling(t *testing.T) {
t.Parallel()
union := buildUnionP9xWithValues([]string{"svc-ok-1", "svc-fail", "svc-ok-2"})
queryRange := v1.Range{
Start: time.Now().Add(-15 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
now := time.Now()
okMatrix1 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-ok-1"}, now, 10)
okMatrix2 := buildMatrix(pmodel.LabelSet{pmodel.LabelName("svc_name"): "svc-ok-2"}, now, 20)
err1 := fmt.Errorf("first failure")
err2 := fmt.Errorf("second failure")
results := []struct {
value pmodel.Value
warnings v1.Warnings
err error
}{
{value: okMatrix1, warnings: nil, err: nil},
{value: nil, warnings: nil, err: err1},
{value: okMatrix2, warnings: nil, err: err2},
}
api := newFakePrometheusAPI(t, results)
ctx := context.Background()
value, warnings, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
if err == nil {
t.Fatalf("expected error, got nil")
}
// Errors are expected to be joined.
unwrapped := errors.Unwrap(err)
if unwrapped == nil {
// If errors.Join is used, Unwrap should be non-nil; if not, we at least
// assert the message contains both underlying messages.
msg := err.Error()
if !strings.Contains(msg, err1.Error()) || !strings.Contains(msg, err2.Error()) {
t.Fatalf("expected joined error to contain both messages, got %q", msg)
}
} else {
// Best-effort: check that the top-level error string contains both messages.
msg := err.Error()
if !strings.Contains(msg, err1.Error()) || !strings.Contains(msg, err2.Error()) {
t.Fatalf("expected joined error to contain both messages, got %q", msg)
}
}
if len(warnings) != 0 {
t.Fatalf("expected no warnings, got %d", len(warnings))
}
// Successful matrices should still be aggregated.
matrix, ok := value.(pmodel.Matrix)
if !ok {
t.Fatalf("expected Matrix result, got %T", value)
}
if got, want := len(matrix), 2; got != want {
t.Fatalf("expected %d successful time series, got %d", want, got)
}
}
// TestQueryRangeWithP9xBuilder_NonMatrixResult verifies that a non-matrix
// pmodel.Value returned from a sub-call results in a core.Error with a clear
// message.
func TestQueryRangeWithP9xBuilder_NonMatrixResult(t *testing.T) {
t.Parallel()
union := buildUnionP9xWithValues([]string{"svc-a"})
queryRange := v1.Range{
Start: time.Now().Add(-5 * time.Minute),
End: time.Now(),
Step: time.Minute,
}
// Return a Vector instead of Matrix to simulate misuse.
vector := pmodel.Vector{
&pmodel.Sample{
Metric: pmodel.Metric{"svc_name": "svc-a"},
Value: 42,
},
}
results := []struct {
value pmodel.Value
warnings v1.Warnings
err error
}{
{value: vector, warnings: nil, err: nil},
}
api := newFakePrometheusAPI(t, results)
ctx := context.Background()
_, _, err := QueryRangeWithP9xBuilder(ctx, api, union, queryRange)
if err == nil {
t.Fatalf("expected error, got nil")
}
var coreErr core.Error
if !errors.As(err, &coreErr) {
t.Fatalf("expected error of type core.Error, got %T", err)
}
msg := coreErr.Error()
if !strings.Contains(strings.ToLower(msg), "matrix") {
t.Fatalf("expected core.Error message to mention matrix, got %q", msg)
}
}- Imports: The new tests rely on additional imports. In the import block of
backend/pkg/repository/prometheus/p9x_builder_test.go, ensure you add (or adjust aliases to match your codebase):contexterrorsfmtstringstimegithub.com/prometheus/common/modelimported aspmodel(or whatever alias your existing code uses)github.com/prometheus/client_golang/api/prometheus/v1imported asv1- The correct package path for your
core.Errortype, for examplegithub.com/<your-org>/<your-repo>/pkg/coreimported ascore(update the path to match your project).
- UnionP9xBuilder / Equals helpers:
- The helper
buildUnionP9xWithValuesassumes:- There is a type
UnionP9xBuilderin this package. - It has a method
Or(...) UnionP9xBuilder. - There is a function
Equals(field, value string) P9xCondition(or similar) compatible withUnionP9xBuilder.Or.
- There is a type
- If your actual builder API differs, update
buildUnionP9xWithValuesand the call sites accordingly so that it produces aUnionP9xBuilderwith one condition persvc_namevalue.
- The helper
- MAX_CONDITIONS_PER_GROUP:
- The tests assume a
const MAX_CONDITIONS_PER_GROUPavailable in this package (or exported from the production code). If the constant is not exported, either:- Move the tests into the same package (non-
_testpackage name), or - Export the constant (e.g.
MaxConditionsPerGroup) and update the tests.
- Move the tests into the same package (non-
- If the name is different in your code, adjust references in the tests.
- The tests assume a
- QueryRangeWithP9xBuilder signature:
- The tests assume a function:
func QueryRangeWithP9xBuilder(ctx context.Context, api v1.API, builder UnionP9xBuilder, r v1.Range) (pmodel.Value, v1.Warnings, error)
- If the actual signature differs (e.g., additional parameters, different builder type, or different return types), update the test calls appropriately.
- The tests assume a function:
- core.Error behavior:
- The last test checks for
errors.As(err, &coreErr)and that the error message mentions "matrix". Ensure your implementation of the non-matrix error path usescore.Errorand includes a clear description (e.g., "expected matrix result, got %T").
- The last test checks for
- API interface version:
- The stub methods in
fakePrometheusAPIare based on the Prometheusv1.APIinterface as of recent versions. If your version differs (extra/removed methods), update the stubbed method set so thatfakePrometheusAPIsatisfies the interface used byQueryRangeWithP9xBuilder.
- The stub methods in
Summary by Sourcery
Introduce a query builder-based Prometheus range query helper to safely execute large union percentile queries by splitting them into smaller chunks and merging results.
New Features:
Bug Fixes:
Enhancements:
Tests: