Skip to content

Conversation

@adamsaimi
Copy link
Owner

Test 9

One potential problem we have with batch processing is that any one slow
item will clog up the whole batch. This pr implements a queueing method
instead, where we keep N queues that each have their own workers.
There's still a chance of individual items backlogging a queue, but we
can try increased concurrency here to reduce the chances of that
happening

<!-- Describe your PR here. -->
"""

def __init__(self) -> None:
self.all_offsets: dict[Partition, set[int]] = defaultdict(set)
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance] Unbounded Offset Tracking Sets

  • Problem: The OffsetTracker stores all observed and outstanding offsets in sets (all_offsets, outstanding) without size limits, leading to excessive memory consumption and potential OOM errors.
  • Fix: Implement a bounded size or an eviction policy for these sets to prevent unbounded memory growth.

while not self.shutdown:
try:
work_item = self.work_queue.get()
except queue.ShutDown:
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Bug] Incorrect Queue Shutdown Mechanism

  • Problem: The OrderedQueueWorker uses a non-standard queue.ShutDown exception and FixedQueuePool calls a non-existent shutdown method, preventing graceful worker termination and causing resource leaks.
  • Fix: Replace queue.ShutDown with a sentinel value or use queue.Empty with a timeout and a self.shutdown flag; remove the invalid q.shutdown call and ensure workers are joined.

start = max(last_committed + 1, min_offset)

highest_committable = last_committed
for offset in range(start, max_offset + 1):
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance] Inefficient Linear Scan for Committable Offsets

  • Problem: The get_committable_offsets function uses a linear scan, which can be a performance bottleneck for large gaps between last_committed and max_offset.
  • Fix: Optimize by tracking contiguous blocks of completed offsets or using a data structure for faster identification.

with self._get_partition_lock(partition):
self.last_committed[partition] = offset
# Remove all offsets <= committed offset
self.all_offsets[partition] = {o for o in self.all_offsets[partition] if o > offset}
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance] Inefficient Set Reconstruction for Offset Cleanup

  • Problem: Reconstructing self.all_offsets[partition] by iterating and filtering is computationally expensive for large sets, impacting commit loop performance.
  • Fix: Explore alternative data structures or methods for more efficient range-based deletion or filtering if all_offsets is expected to be large.

self.workers: list[OrderedQueueWorker[T]] = []

for i in range(num_queues):
work_queue: queue.Queue[WorkItem[T]] = queue.Queue()
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Performance] Unbounded Queues Cause Resource Exhaustion

  • Problem: queue.Queue() instances without a maxsize can grow indefinitely, leading to excessive memory consumption and OutOfMemory errors.
  • Fix: Define a reasonable maxsize for queues to prevent memory exhaustion and provide natural backpressure.
    pythonwork_queue: queue.Queue[WorkItem[T]] = queue.Queue(maxsize=some_configurable_limit)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants