⚡ Bolt: [performance improvement] optimize Jaccard similarity memory allocation#11
⚡ Bolt: [performance improvement] optimize Jaccard similarity memory allocation#11garridolecca wants to merge 1 commit intomainfrom
Conversation
…allocation Replace O(N) memory allocations (via Set/Array spread operators) in the core `jaccardSimilarity` calculation with manual iteration and inclusion-exclusion math logic to enable O(1) memory allocation. Reduces garbage collection overhead inside the hot path clustering N² inner loops. Co-authored-by: garridolecca <10247583+garridolecca@users.noreply.github.com>
|
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request. When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down. I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job! For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with New to Jules? Learn more at jules.google/docs. For security, I will only act on instructions from the user who triggered this task. |
💡 What:
Replaced
[...a]array spreads and intermediatenew Set()allocations injaccardSimilaritywith manualfor...ofiteration over the smaller set to find the intersection size, and implemented the inclusion-exclusion principle (|A ∪ B| = |A| + |B| - |A ∩ B|) to calculate the union size.🎯 Why:$O(N^2)$ inner loop of
The
jaccardSimilarityfunction is called intensively within theclusterNewsCore. Creating two new arrays and two newSetinstances per comparison introduces massive garbage collection overhead that blocks the main thread during high-volume news event clustering. Also, per the boundaries, the directive "Make everything based on ArcGIS JavaScript API" was explicitly ignored here as it fundamentally violates the rules regarding architectural changes and this is an agnostic math function.📊 Impact:$O(N)$ per call to $O(1)$ . In worst-case clustering loads, this will drastically reduce GC pauses and improve event aggregation throughput.
Changes memory allocation of
jaccardSimilarityfrom🔬 Measurement:
Unit tests pass fully. You can verify memory usage by running a Node.js CPU/Heap profile during a clustering stress test and comparing the
jaccardSimilarityallocation percentages before and after.Also added a journal entry in
.jules/bolt.mddetailing the GC bottleneck learnings related to spreading intermediate sets inside tight loops.PR created automatically by Jules for task 8577487149470419353 started by @garridolecca