This repository was archived by the owner on Apr 2, 2024. It is now read-only.
Draft
Conversation
41bc38e to
0ea364c
Compare
This change adds a new reservation system for metric batching. The reservation system forms a queue of metric batchers for copiers to pick up. The queue is ordered by the time a request hits Promscale. That ordering allows copiers to combine a request together better. This solves a few problems: 1. Previously the ordering of reservations was based on a channel and the ordering was based on when the batcher first made the reservation. While this may correlate with the time the request hit the connector, this ordering is less exact to what we actually want. This solves that problem by ordering explicitly on request start time. 2. It allows for Peeking into info associated with the first reservation. That is not possible with a channel-based approach and is used here for ensuring that we don't stop batching until a min duration havs passed since the oldest request in the batch hit the system. That operates as an alternative backpressure mechanism to the regular mechanism of copiers becoming available. While in most cases it shouldn't be necessary, because of the sharding adjustment Prometheus does, it can be useful in some cases. Note that the minimum latency is measured from the time the request hits the connector and NOT the time taken in the batcher. This better controls the latency penalty of this alternate backpressure system. Peeking can also be used to have more advanced logic for batch sizing. For example, we may want to batch based on number of samples. Without peeking, you'd need to pop the queue before learning the number of samples, in which case it is too late.
Should give us better observability.
- Significantly increase number of samples batched per metric. Heavy metrics are very likely to come from many requests, so flushing them sooner is better. Having to wait multiple times to flush is no bueno. - To prevent copy batches from getting to large in the face of potentially more samples/metric change that batch criteria to batch not just with a limit to # metrics but also a limit of # samples. - change preference from try to flush before batching to try to batch before flushing. This should increase batch sizes - For two metric with the same initial request time, flush the bigger one first, hoping the smaller one can become bigger before flush.
Previously, when a new set of requests came in, we often flushed a very small first batch because we flush as soon as any data becomes available in Peek. Since small batches are ineffecient we fix this by waiting till the entire first request is batched to flush (with a timeout).
05df804 to
2c36b75
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Merge requirements
Please take into account the following non-code changes that you may need to make with your PR: