refactor: hub-aware decompose grouping to prevent mega-clusters#956
Open
refactor: hub-aware decompose grouping to prevent mega-clusters#956
Conversation
The call graph clustering used undirected union-find, which collapsed everything reachable from a high-fan-out orchestrator into a single mega-group. For contract_testgen.rs (3k lines), this produced 3 groups with one containing 15 functions. Changes: - Identify hub functions (>=4 callees) and exclude their edges from union-find, preventing transitive mega-clusters - Add dominant prefix detection for semantic cluster labels (e.g., resolve_*, infer_*) instead of naming after the most-called function - Expand stop word list to avoid generic cluster names - Hub functions fall through to name-based clustering where they can form focused groups with similarly-named functions Results on contract_testgen.rs: Before: 3 groups (one with 15 functions named 'infer_hint_for_param') After: 8 focused groups (build, generate_test, helpers, types, etc.) Results on extension/mod.rs: Before: monolithic groups After: 9 groups (capability, find_extension, resolve, types, etc.) Results on rename/mod.rs: Before: monolithic groups After: 9 groups (case_utilities, reference_finding, rename_generation, types, etc.)
Contributor
Homeboy Results —
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
refactor decomposealgorithm used undirected union-find for call graph clustering, which collapsed everything reachable from a high-fan-out orchestrator function into a single mega-groupresolve_*functions →resolve.rs) instead of naming after the most-called functionBefore vs After
contract_testgen.rs(2,998 lines)Before (3 groups, one mega-group of 15):
After (8 focused groups):
extension/mod.rs(1,130 lines) → 9 groupsrename/mod.rs(1,912 lines) → 9 groupsHow It Works
The key insight is that orchestrator functions (those that call 4+ other functions in the same file) create transitive closure in union-find. By excluding hub edges:
Note
The Rust grammar parser currently only matches
pubandpub(crate)function visibility —pub(super)and private functions with complex signatures may not be parsed. This limits decompose to ~20 of ~30 items incontract_testgen.rs. That's a separate grammar issue (#818 scope).Related