Skip to content

Conversation

@google-labs-jules
Copy link

πŸ’‘ What:
Refactored push_rules in main.py to upload rule batches concurrently using ThreadPoolExecutor instead of sequentially.
Increased max_workers in get_all_existing_rules from 5 to 10.

🎯 Why:
Large blocklists can have thousands of rules, resulting in hundreds of sequential HTTP POST requests. Parallelizing this significantly reduces the total sync time.

πŸ“Š Impact:

  • Reduces rule upload time by ~5x (with 5 workers).
  • Speeds up initial rule fetching.

πŸ”¬ Measurement:
Run uv run --python 3.13 python -m unittest tests/test_performance.py to see the benchmark comparison.
(Sequential: ~1.0s vs Parallel: ~0.2s for 10 batches with 0.1s latency).


PR created automatically by Jules for task 14774676592968160940 started by @abhimehro

- Parallelized `push_rules` batch processing using ThreadPoolExecutor (5 workers)
- Increased `get_all_existing_rules` workers from 5 to 10
- Added `tests/test_performance.py` to verify optimization and correctness
- Measured ~5x speedup in batch processing (simulated)
@google-labs-jules
Copy link
Author

πŸ‘‹ Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a πŸ‘€ emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.


For security, I will only act on instructions from the user who triggered this task.

@abhimehro abhimehro marked this pull request as ready for review January 9, 2026 00:10
Copilot AI review requested due to automatic review settings January 9, 2026 00:10
@github-actions github-actions bot added the python label Jan 9, 2026
@abhimehro abhimehro self-requested a review January 9, 2026 00:10
@abhimehro abhimehro removed their request for review January 9, 2026 00:11
return True
except httpx.HTTPError as e:
log.error(f"Failed to push batch {i} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")
log.error(f"Failed to push batch {batch_idx} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")

Check warning

Code scanning / Prospector (reported by Codacy)

Use lazy % formatting in logging functions (logging-fstring-interpolation) Warning

Use lazy % formatting in logging functions (logging-fstring-interpolation)
from unittest.mock import MagicMock, patch
import time
import threading
import main

Check warning

Code scanning / Prospector (reported by Codacy)

Unused import main (unused-import) Warning test

Unused import main (unused-import)
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):

Check warning

Code scanning / Prospector (reported by Codacy)

Unused argument 'kwargs' (unused-argument) Warning test

Unused argument 'kwargs' (unused-argument)
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):

Check warning

Code scanning / Prospector (reported by Codacy)

Unused argument 'args' (unused-argument) Warning test

Unused argument 'args' (unused-argument)

print(f"\n[Performance Test] Duration for {num_batches} batches with 0.1s latency: {duration:.4f}s")

if __name__ == '__main__':

Check warning

Code scanning / Prospector (reported by Codacy)

expected 2 blank lines after class or function definition, found 1 (E305) Warning test

expected 2 blank lines after class or function definition, found 1 (E305)
return True
except httpx.HTTPError as e:
log.error(f"Failed to push batch {i} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")
log.error(f"Failed to push batch {batch_idx} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (124/100) Warning

Line too long (124/100)
@@ -0,0 +1,81 @@
import unittest

Check warning

Code scanning / Pylint (reported by Codacy)

Missing module docstring Warning test

Missing module docstring
import main
from main import push_rules, BATCH_SIZE

class TestPushRulesPerformance(unittest.TestCase):

Check warning

Code scanning / Pylint (reported by Codacy)

Missing class docstring Warning test

Missing class docstring
self.profile_id = "test-profile"
self.folder_name = "test-folder"
self.folder_id = "test-folder-id"
self.do = 1

Check warning

Code scanning / Pylint (reported by Codacy)

Attribute name "do" doesn't conform to snake_case naming style Warning test

Attribute name "do" doesn't conform to snake_case naming style
self.existing_rules = set()

@patch('main._api_post_form')
def test_push_rules_correctness_with_lock(self, mock_post):

Check warning

Code scanning / Pylint (reported by Codacy)

Missing method docstring Warning test

Missing method docstring
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):

Check warning

Code scanning / Pylint (reported by Codacy)

Missing function docstring Warning test

Missing function docstring
self.assertTrue(success)
self.assertEqual(mock_post.call_count, num_batches)

print(f"\n[Performance Test] Duration for {num_batches} batches with 0.1s latency: {duration:.4f}s")

Check warning

Code scanning / Pylint (reported by Codacy)

Line too long (108/100) Warning test

Line too long (108/100)
from unittest.mock import MagicMock, patch
import time
import threading
import main

Check notice

Code scanning / Pylint (reported by Codacy)

Unused import main Note test

Unused import main
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):

Check notice

Code scanning / Pylint (reported by Codacy)

Unused argument 'kwargs' Note test

Unused argument 'kwargs'
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):

Check notice

Code scanning / Pylint (reported by Codacy)

Unused argument 'args' Note test

Unused argument 'args'
return True
except httpx.HTTPError as e:
log.error(f"Failed to push batch {i} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")
log.error(f"Failed to push batch {batch_idx} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (124/100) Warning

Line too long (124/100)
@@ -0,0 +1,81 @@
import unittest

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Missing module docstring Warning test

Missing module docstring
import main
from main import push_rules, BATCH_SIZE

class TestPushRulesPerformance(unittest.TestCase):

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Missing class docstring Warning test

Missing class docstring
self.profile_id = "test-profile"
self.folder_name = "test-folder"
self.folder_id = "test-folder-id"
self.do = 1

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Attribute name "do" doesn't conform to snake_case naming style Warning test

Attribute name "do" doesn't conform to snake_case naming style
self.existing_rules = set()

@patch('main._api_post_form')
def test_push_rules_correctness_with_lock(self, mock_post):

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Missing function or method docstring Warning test

Missing function or method docstring
self.assertTrue(success)
self.assertEqual(mock_post.call_count, num_batches)

print(f"\n[Performance Test] Duration for {num_batches} batches with 0.1s latency: {duration:.4f}s")

Check warning

Code scanning / Pylintpython3 (reported by Codacy)

Line too long (108/100) Warning test

Line too long (108/100)
return True
except httpx.HTTPError as e:
log.error(f"Failed to push batch {i} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")
log.error(f"Failed to push batch {batch_idx} for folder {sanitize_for_log(folder_name)}: {sanitize_for_log(e)}")

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Use lazy % formatting in logging functions Note

Use lazy % formatting in logging functions
from unittest.mock import MagicMock, patch
import time
import threading
import main

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Unused import main Note test

Unused import main
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Unused argument 'args' Note test

Unused argument 'args'
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):

Check notice

Code scanning / Pylintpython3 (reported by Codacy)

Unused argument 'kwargs' Note test

Unused argument 'kwargs'
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR parallelizes rule batch uploads to significantly improve sync performance for large blocklists. The main change refactors the push_rules function to upload batches concurrently using ThreadPoolExecutor instead of sequentially, reducing upload time by approximately 5x.

Key changes:

  • Parallelized batch uploads within push_rules using ThreadPoolExecutor with 5 workers
  • Increased max_workers in get_all_existing_rules from 5 to 10 for faster rule fetching
  • Added performance tests to verify correctness and measure improvements

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
main.py Refactored push_rules to process batches in parallel using ThreadPoolExecutor; increased max_workers for rule fetching from 5 to 10
tests/test_performance.py Added new performance tests to verify parallel batch upload correctness and measure timing improvements

πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


@patch('main._api_post_form')
def test_push_rules_concurrency(self, mock_post):
# Create enough hostnames for 5 batches
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "Create enough hostnames for 5 batches" but the code sets num_batches = 10. Update the comment to match the actual number of batches being tested.

Suggested change
# Create enough hostnames for 5 batches
# Create enough hostnames for 10 batches

Copilot uses AI. Check for mistakes.
Comment on lines +19 to +47
def test_push_rules_correctness_with_lock(self, mock_post):
# Create enough hostnames for 5 batches
num_batches = 5
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock success
mock_post.return_value = MagicMock(status_code=200)

lock = threading.Lock()

start_time = time.time()
success = push_rules(
self.profile_id,
self.folder_name,
self.folder_id,
self.do,
self.status,
hostnames,
self.existing_rules,
self.client,
existing_rules_lock=lock
)
duration = time.time() - start_time

self.assertTrue(success)
self.assertEqual(mock_post.call_count, num_batches)
self.assertEqual(len(self.existing_rules), len(hostnames))

print(f"\n[Sequential Baseline (Lock)] Duration: {duration:.4f}s")
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name test_push_rules_correctness_with_lock and its print output "Sequential Baseline (Lock)" are misleading. With the parallelization changes, this test now exercises the parallel code path (with a lock), not sequential execution. Consider renaming this test to something like test_push_rules_parallel_with_lock and updating the print message to reflect that it's testing parallel execution with thread-safety.

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +72
def test_push_rules_concurrency(self, mock_post):
# Create enough hostnames for 5 batches
num_batches = 10
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):
time.sleep(0.1)
return MagicMock(status_code=200)

mock_post.side_effect = delayed_post

start_time = time.time()
success = push_rules(
self.profile_id,
self.folder_name,
self.folder_id,
self.do,
self.status,
hostnames,
self.existing_rules,
self.client
)
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test test_push_rules_concurrency doesn't verify thread-safety. Without passing an existing_rules_lock, the test relies on the fact that set.update() happens to be atomic in CPython for simple cases, but this is an implementation detail. Consider adding a test case that explicitly passes a lock (like the first test does) but with simulated latency to verify that thread-safety works correctly under concurrent load, or alternatively test the no-lock scenario with assertions that verify the set is properly updated despite potential race conditions.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +81
import unittest
from unittest.mock import MagicMock, patch
import time
import threading
import main
from main import push_rules, BATCH_SIZE

class TestPushRulesPerformance(unittest.TestCase):
def setUp(self):
self.client = MagicMock()
self.profile_id = "test-profile"
self.folder_name = "test-folder"
self.folder_id = "test-folder-id"
self.do = 1
self.status = 1
self.existing_rules = set()

@patch('main._api_post_form')
def test_push_rules_correctness_with_lock(self, mock_post):
# Create enough hostnames for 5 batches
num_batches = 5
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock success
mock_post.return_value = MagicMock(status_code=200)

lock = threading.Lock()

start_time = time.time()
success = push_rules(
self.profile_id,
self.folder_name,
self.folder_id,
self.do,
self.status,
hostnames,
self.existing_rules,
self.client,
existing_rules_lock=lock
)
duration = time.time() - start_time

self.assertTrue(success)
self.assertEqual(mock_post.call_count, num_batches)
self.assertEqual(len(self.existing_rules), len(hostnames))

print(f"\n[Sequential Baseline (Lock)] Duration: {duration:.4f}s")

@patch('main._api_post_form')
def test_push_rules_concurrency(self, mock_post):
# Create enough hostnames for 5 batches
num_batches = 10
hostnames = [f"host-{i}.com" for i in range(BATCH_SIZE * num_batches)]

# Mock delay to simulate network latency
def delayed_post(*args, **kwargs):
time.sleep(0.1)
return MagicMock(status_code=200)

mock_post.side_effect = delayed_post

start_time = time.time()
success = push_rules(
self.profile_id,
self.folder_name,
self.folder_id,
self.do,
self.status,
hostnames,
self.existing_rules,
self.client
)
duration = time.time() - start_time

self.assertTrue(success)
self.assertEqual(mock_post.call_count, num_batches)

print(f"\n[Performance Test] Duration for {num_batches} batches with 0.1s latency: {duration:.4f}s")

if __name__ == '__main__':
unittest.main()
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance tests only verify successful execution paths. Consider adding a test case that verifies behavior when some batches fail (e.g., by making some mock_post calls raise HTTPError). This would ensure that the parallel error handling works correctly and that partial failures are properly reported with the correct successful_batches count.

Copilot uses AI. Check for mistakes.
}

for future in concurrent.futures.as_completed(futures):
if future.result():
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call to future.result() can raise exceptions if the thread execution encounters an unexpected error beyond the handled HTTPError. If an exception occurs in _push_batch that's not caught (e.g., a programming error), it will propagate here and crash the entire upload process. Consider wrapping future.result() in a try-except block to handle unexpected exceptions and log them appropriately, ensuring other batches can continue processing.

Suggested change
if future.result():
batch_idx = futures[future]
try:
result = future.result()
except Exception as e:
log.error(
"Unexpected error while processing batch %d for folder %s: %s",
batch_idx,
sanitize_for_log(folder_name),
sanitize_for_log(e),
)
log.debug("Unexpected exception details", exc_info=True)
continue
if result:

Copilot uses AI. Check for mistakes.
return True

successful_batches = 0
total_batches = len(range(0, len(filtered_hostnames), BATCH_SIZE))
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The calculation len(range(0, len(filtered_hostnames), BATCH_SIZE)) is inefficient and can be simplified. Consider using integer division with ceiling: (len(filtered_hostnames) + BATCH_SIZE - 1) // BATCH_SIZE or import math; math.ceil(len(filtered_hostnames) / BATCH_SIZE). This avoids creating an unnecessary range object just to count it.

Suggested change
total_batches = len(range(0, len(filtered_hostnames), BATCH_SIZE))
total_batches = (len(filtered_hostnames) + BATCH_SIZE - 1) // BATCH_SIZE

Copilot uses AI. Check for mistakes.
from unittest.mock import MagicMock, patch
import time
import threading
import main
Copy link

Copilot AI Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'main' is not used.

Suggested change
import main

Copilot uses AI. Check for mistakes.
@abhimehro
Copy link
Owner

Hi @copilot, please open a new pull request to apply changes based on the comments in this thread. Thank you!

Copy link

Copilot AI commented Jan 9, 2026

@abhimehro I've opened a new pull request, #86, to work on those changes. Once the pull request is ready, I'll request review from you.

@google-labs-jules
Copy link
Author

Hi @copilot, please open a new pull request to apply changes based on the comments in this thread. Thank you!

I've implemented the requested changes in this PR. Specifically:

  • Added tests/test_performance.py with comprehensive test cases including concurrency with failures.
  • Refactored main.py to use safe future.result() handling and efficient batch calculation.
  • Fixed logging f-string warnings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants