-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
TableDetector._detect_text_based_table has cyclomatic complexity grade E (score 32), making it the highest-complexity function in the codebase. It is temporarily excluded from the Xenon CI gate (introduced in #88) to unblock the gate without forcing an unsafe blind refactor.
This issue tracks the work needed to remove that exclusion.
Exit condition
The --exclude flag for packages/parser-core/src/bankstatements_core/analysis/table_detector.py must be removed from the Xenon CI step in .github/workflows/ci.yml once this issue is resolved.
Why not now
The function is a PDF heuristic hotspot tightly coupled to the pdfplumber word-coordinate API. Decomposing it without targeted characterisation tests carries high regression risk — it cannot be safely changed without first pinning its observed behaviour.
Required sequence
-
Write characterisation tests for
_detect_text_based_table- Cover the main branching paths: empty words, column-coverage threshold, text-density threshold, word-gap detection
- Use real or synthetic pdfplumber word objects as fixtures
- Tests must pass before any structural changes are made
-
Decompose the function into cohesive sub-functions
- Candidate extractions: column-coverage check, density check, word-gap scan, boundary decision
- Each sub-function should have a single responsibility and be independently testable
-
Remove the Xenon exclusion
- Delete
--exclude packages/parser-core/src/bankstatements_core/analysis/table_detector.pyfrom the CI step - Confirm gate passes with the new structure
- Delete
Related
- Introduced by: CI-02: wire Xenon complexity gate into CI using installed Radon baseline #88 (Xenon complexity gate)