Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #345 +/- ##
==========================================
+ Coverage 71.52% 71.77% +0.25%
==========================================
Files 204 205 +1
Lines 14812 14978 +166
==========================================
+ Hits 10594 10751 +157
- Misses 3454 3459 +5
- Partials 764 768 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
🔴 Performance DegradationSome benchmarks have degraded compared to the previous run. Show table
|
|
Nice, @cheb0 The benchmark with identificator Show summary
Have a great time! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
# Conflicts: # node/cmp_lid.go # node/node.go # node/node_and.go # node/node_or.go # node/node_static.go
| continue | ||
| } | ||
|
|
||
| idx, found := util.GallopSearchLeq(it.lids, nextID.Unpack()) |
There was a problem hiding this comment.
Does it really perform better then ordinary binary search?
Like, additional contribution of this algorithm -- calculating upper bound for binary search (or lower bound for leq case).
| if vals[0] >= x { | ||
| return 0, true | ||
| } | ||
| hi := 1 |
There was a problem hiding this comment.
I guess here you can calculate low as well.
I am not sure though that this function is really useful since we have pretty small LID blocks (64k LIDs per block, and IIRC you wanted to reduce this number to 4k) or I am tripping?
func GallopSearchGeq(vals []uint32, x uint32) (idx int, found bool) {
n := len(vals)
if n == 0 {
return 0, false
}
if vals[0] >= x {
return 0, true
}
hi := 1
for hi < n && vals[hi] < x {
hi *= 2
}
end := min(n, hi+1)
start := end / 2
idx, found = slices.BinarySearch(vals[start:end], x)
idx += start
if idx >= end {
return 0, false
}
return idx, true
}Please be aware that I did not properly test it -- it's just raw idea (however, your test-cases are green).
| for { | ||
| for len(it.lids) == 0 { | ||
| if !it.tryNextBlock { | ||
| return node.NewLIDOrderDesc(math.MaxUint32) |
There was a problem hiding this comment.
nit: Please do not forget to use node.NullLID
| it.counter.AddLIDsCount(len(it.lids)) | ||
| } | ||
|
|
||
| // fast path: smallest remaining > nextID => skip entire block |
There was a problem hiding this comment.
I guess it's worth leaving a TODO here with something like:
We could also pass LID into
narrowLIDsRangeto perform block skipping once we add something likeMinLIDto LID block header
You've mentioned this on PR description:
It's also a base for future improvements like skipping disk reads
It will eliminate such "fast" path.
| @@ -0,0 +1,49 @@ | |||
| package util | |||
There was a problem hiding this comment.
I've asked LLM to generate benchmarks for different scenarios and block sizes. Here is the reults:
GallopFront/4k-16 34.06n ± 2%
GallopFront/8k-16 38.97n ± 1%
GallopFront/16k-16 45.51n ± 2%
GallopFront/32k-16 49.22n ± 3%
GallopFront/64k-16 51.70n ± 4%
GallopFront/128k-16 57.81n ± 4%
BinaryFront/4k-16 34.71n ± 4%
BinaryFront/8k-16 39.49n ± 2%
BinaryFront/16k-16 42.93n ± 3%
BinaryFront/32k-16 46.55n ± 2%
BinaryFront/64k-16 47.56n ± 1%
BinaryFront/128k-16 52.17n ± 3%
GallopUniform/4k-16 49.33n ± 4%
GallopUniform/8k-16 51.72n ± 2%
GallopUniform/16k-16 58.49n ± 5%
GallopUniform/32k-16 68.36n ± 4%
GallopUniform/64k-16 76.25n ± 3%
GallopUniform/128k-16 85.15n ± 5%
BinaryUniform/4k-16 42.32n ± 2%
BinaryUniform/8k-16 47.48n ± 4%
BinaryUniform/16k-16 53.79n ± 3%
BinaryUniform/32k-16 60.30n ± 2%
BinaryUniform/64k-16 65.98n ± 2%
BinaryUniform/128k-16 73.05n ± 1%
GallopBack/4k-16 40.05n ± 1%
GallopBack/8k-16 44.22n ± 2%
GallopBack/16k-16 48.66n ± 1%
GallopBack/32k-16 55.97n ± 4%
GallopBack/64k-16 60.87n ± 6%
GallopBack/128k-16 66.48n ± 5%
BinaryBack/4k-16 34.15n ± 3%
BinaryBack/8k-16 38.01n ± 4%
BinaryBack/16k-16 39.21n ± 1%
BinaryBack/32k-16 44.06n ± 6%
BinaryBack/64k-16 50.46n ± 2%
BinaryBack/128k-16 55.85n ± 2%
Seems like ordinary binary search perform even better. We can discuss it.
| } | ||
| } | ||
|
|
||
| func NewLID(lid uint32, reverse bool) LID { |
There was a problem hiding this comment.
I am strongly against using relative properties, honestly.
It's better to have desc bool or asc bool than reverse bool since it requires understanding of context.
| // NextGeq finds next greater or equals since iteration is in ascending order | ||
| func (n *staticAsc) NextGeq(nextID LID) LID { | ||
| if n.ptr >= len(n.data) { | ||
| return NewLIDOrderDesc(math.MaxUint32) |
There was a problem hiding this comment.
nit: Please do not forget to use node.NullLID
| from := n.ptr | ||
| idx, found := util.GallopSearchGeq(n.data[from:], nextID.Unpack()) | ||
| if !found { | ||
| return NewLIDOrderDesc(math.MaxUint32) |
There was a problem hiding this comment.
nit: Please do not forget to use node.NullLID
| // NextGeq finds next less or equals since iteration is in descending order | ||
| func (n *staticDesc) NextGeq(nextID LID) LID { | ||
| if n.ptr < 0 { | ||
| return NewLIDOrderAsc(0) |
There was a problem hiding this comment.
nit: Please do not forget to use node.NullLID
| } | ||
| idx, found := util.GallopSearchLeq(n.data[:n.ptr+1], nextID.Unpack()) | ||
| if !found { | ||
| return NewLIDOrderAsc(0) |
There was a problem hiding this comment.
nit: Please do not forget to use node.NullLID
| for { | ||
| for len(it.lids) == 0 { | ||
| if !it.tryNextBlock { | ||
| return node.NewLIDOrderDesc(math.MaxUint32) |
| continue | ||
| } | ||
|
|
||
| idx, found := util.GallopSearchGeq(it.lids, nextID.Unpack()) |
There was a problem hiding this comment.
nit: Galloping search is only more efficient when we expect the element to be located at the very beginning of the list. Otherwise, pure binary search is better and simpler.
Can we benchmark both algorithms on a real data?
|
|
||
| // BenchmarkOrTreeNextGeq checks the performance of NextGeq vs Next when no skipping occur and all node | ||
| // yield distinct values (no intersection between nodes) | ||
| func BenchmarkOrTreeNextGeq(b *testing.B) { |
There was a problem hiding this comment.
nit: just want to clarify for myself. Do I understand correctly that we don't expect acceleration specifically for only-Or-nodes, so this benchmark is to make sure there's no degradation?
| // Fast path: if we at least left or right and there is nothing to skip, then choose lowest and return. | ||
| minID := Min(n.leftID, n.rightID) | ||
| if nextID.LessOrEq(minID) { | ||
| if n.leftID.Less(n.rightID) { |
There was a problem hiding this comment.
nit: Can we just call return n.Next() here?
// Fast path: if we at least left or right and there is nothing to skip, then choose lowest and return.
if nextID.LessOrEq(Min(n.leftID, n.rightID)) {
return n.Next()
}| } | ||
|
|
||
| func (n *nodeAnd) NextGeq(nextID LID) LID { | ||
| for !n.leftID.IsNull() && !n.rightID.IsNull() && !n.leftID.Eq(n.rightID) { |
There was a problem hiding this comment.
It seems there might be incorrect behavior here
Imagine, this is the first call of NextGEQ and you have the left and right equal (but less than the nextID), what will the method return then?
|
|
||
| node := NewAnd(left, right) | ||
|
|
||
| // Currently, nodes instantiate their state on creation, which will be fixed later. |
There was a problem hiding this comment.
I think this behavior doesn't match the expectations for the method
| return NullLID() | ||
| } | ||
|
|
||
| func (n *nodeNAnd) NextGeq(nextID LID) LID { |
There was a problem hiding this comment.
If this is a temporary stub for the actual implementation, wouldn't it be better to do something like this?
func (n *nodeNAnd) NextGeq(nextID LID) LID {
lid := n.Next()
for lid.Less(nextID) {
lid = n.Next()
}
return lid
}# Conflicts: # node/bench_test.go
# Conflicts: # frac/processor/aggregator.go # node/bench_test.go # node/node_and.go
Description
A new operation which boosts search and aggregation performance. It's implemented on top of faster cmp branch because fast compares are vital for this operation. It's also a base for future improvements like skipping disk reads.
Currently,
NextGeqworks more like a hint - some nodes do not support it andNextGeqbehaviour is same asNext. Therefore, some operations still have loops even when callingNextGeq(i.e. scrolling past lower values), like in aggregator.go. Might be improved in future.Measurements
Performance improvement depends on a particular operation and search request (and even tokens). If we can skip a lot, then there is a lot of improvement.
For example,
service:X AND level:3speeds up by 60% whenlevel:3is used, and speeds up only by 6% whenlevel:4is used as second predicate. Because we reduce total number of results withlevel:3by a lot, while almost all service logs haslevel:4.The performance improvement on typical user aggregation is good, since we skip a lot in agg tree.
service:xyz | group by k8s_pod count(*)(prod fracs)master:
370-456 ms(hot-cold)fast cmp:
320-402 msnext geq:
73-153 ms(-50-70%)There are also downsides. It's not zero-cost. For example, the case when we iterate the whole tree and there is nothing to skip (i.e. we traverse exactly same paths but there is an extra comparison in
NextGeqoperation) :exists:service | group by k8d_pod count(*)
master:
690-821 msfast cmp:
600-730 msnext geq:
662-800 ms(+10% skipping overhead)However, thanks to faster cmp, it will still be faster than current main branch, i.e. there is no performance downgrade. The next steps are probably to implement some sort of cost based planning (and cardinality estimation) where we can evaluate if there is a skipping potential and disable it altogether, or maybe even avoid constructing an aggregation tree.
This PR partially adresses #332