Skip to content

tooling: add Pylint design checks + Radon/Xenon complexity budgets to CI #85

@longieirl

Description

@longieirl

Background

This issue evaluates three tools proposed for code quality: Pylint (design smell checks), Radon/Xenon (complexity budgets), and mutmut (mutation testing). Based on the current state of the codebase, only the first two are recommended now. mutmut is explicitly deferred — see rationale below.

Depends on / complements #84 (ruff migration).


Current State

Coverage

  • parser-core: 92% (gate: 91%) — solid but not near-complete
  • Known weak spot: exclusion_detector.py ~59% (pdfplumber API dependency)

Existing lint tooling

Complexity hotspots already visible in codebase

A branch-count scan of packages/parser-core/src surfaced these functions exceeding 8 branches:

Branches Args Function File
27 2 _detect_text_based_table() analysis/table_detector.py
22 3 init_directories() commands/init.py
17 3 get_detection_explanation() templates/template_detector.py
16 2 _detect_recurring_charges() services/expense_analysis.py
15 1 analyze() commands/analyze_pdf.py
14 3 detect_template() templates/template_detector.py
14 4 detect() templates/detectors/iban_detector.py
14 3 merge_continuation_lines() services/row_merger.py
13 3 _assign_column_names() analysis/column_analyzer.py
13 2 from_multiple_directories() templates/template_registry.py

These are real refactor candidates — not hypothetical. A complexity gate in CI would prevent new functions from silently joining this list.


Tool 1 — Pylint (design checks only)

Recommendation: Add, scoped to R-category codes only

Pylint has broad overlap with Ruff for style and lint. Running the full suite alongside Ruff creates redundancy and noise. The non-overlapping value is its design metric checks (R-category / refactor codes):

Pylint check Default threshold What it catches
too-many-branches >12 Functions like _detect_text_based_table (27 branches)
too-many-statements >50 Overly long functions
too-many-locals >15 Functions juggling too much state
too-many-arguments >5 Interfaces with too many parameters
too-many-public-methods >20 Classes doing too much
too-many-instance-attributes >7 Data classes masquerading as services

Run mode: pylint src --disable=all --enable=R — disables all style/convention/warning codes that Ruff already covers; enables only design smell detection.

What is NOT being replaced

Pylint runs alongside Ruff for a different purpose:

  • Ruff: style, formatting, imports, common lint patterns (fast, per-line)
  • Pylint: structural design metrics (slower, per-function/class)

Impact

  • No change to formatting or import behaviour
  • Flags the 20+ functions already identified as complexity hotspots
  • CI fails on new functions exceeding thresholds (prevents regressions)
  • Existing violations need a one-time baseline suppression pass or refactor

Tool 2 — Radon + Xenon (complexity budgets)

Recommendation: Add both

Radon computes:

  • Cyclomatic complexity (CC): number of linearly independent paths through a function. Grade A (1-5) to F (>25)
  • Halstead metrics: volume, effort, estimated bug count based on operators/operands
  • Maintainability Index (MI): composite score 0-100; below 25 = unmaintainable

Xenon wraps Radon with CI thresholds — exits non-zero when complexity exceeds configured budgets.

Proposed Xenon thresholds for CI

xenon --max-absolute C --max-modules B --max-average A packages/parser-core/src

Meaning:

  • No single function worse than grade C (CC 11-15) — lenient enough to not retroactively fail
  • No module average worse than grade B (CC 6-10)
  • No project average worse than grade A (CC 1-5)

The threshold is intentionally set to catch new regressions, not fail on existing hotspots. Existing violations (grade D-F) are tracked separately for refactoring.

Radon in CI (reporting only, non-blocking)

- name: Radon complexity report (C-grade and worse)
  run: radon cc packages/parser-core/src -s -n C

Produces a human-readable report as a CI step summary — useful for tracking debt trend over time without blocking the build.


Tool 3 — mutmut (mutation testing)

Recommendation: Defer — not yet appropriate

Mutation testing works by injecting small code changes (mutations) and verifying the test suite detects them. It is most valuable when:

  1. Coverage is near-complete (98%+) — at 92%, untested lines produce trivially surviving mutants with no signal value
  2. The test suite is stable and fast — mutmut reruns the full suite for every mutation; 1,420 tests against 16,560 lines takes 30-90 minutes
  3. Known weak spots are closed — exclusion_detector.py at ~59% would generate large volumes of false-signal survivors

Revisit when:

This is not a rejection — it is the right tool at the wrong time. A comment will be added to this issue when the preconditions are met.


What Is Not Being Replaced

Tool Status Reason
mypy Kept Static type checker — different domain from design metrics; already in CI
bandit / safety Kept Security scanning — separate concern
stubtest Kept pdfplumber stub validation — no Radon/Pylint equivalent
ruff Kept Style/lint/format — see #84

Files to Change

File Change
packages/parser-core/pyproject.toml Add [tool.pylint] config (R-codes only, custom thresholds); add pylint, radon, xenon to [dev] deps
.github/workflows/ci.yml Add complexity job (radon report + xenon gate); extend lint-core with scoped pylint step; add complexity to ci-pass needs
Makefile Add make complexity (radon + xenon) and make design-check (pylint R-codes) targets
CONTRIBUTING.md Document complexity thresholds, how to run locally, and how to suppress legitimate violations

Proposed CI additions

New complexity job:

complexity:
  name: Complexity — radon + xenon
  runs-on: ubuntu-latest
  timeout-minutes: 5
  needs: changes
  if: needs.changes.outputs.core == 'true'
  steps:
    - uses: actions/checkout@v6
    - uses: actions/setup-python@v6
      with:
        python-version: '3.12'
    - run: pip install radon xenon
    - name: Radon complexity report (C-grade and worse)
      run: radon cc packages/parser-core/src -s -n C
    - name: Xenon complexity gate
      run: xenon --max-absolute C --max-modules B --max-average A packages/parser-core/src

Extension to lint-core:

- name: Pylint design checks (R-codes only)
  run: pylint packages/parser-core/src --disable=all --enable=R --fail-under=8.0

Test Plan

  • Run radon cc src -s -n C locally — review all C-grade-and-worse functions, document known violations
  • Run xenon --max-absolute C --max-modules B --max-average A src locally — confirm passes with the proposed lenient initial thresholds
  • Run pylint src --disable=all --enable=R locally — suppress legitimate exceptions with # pylint: disable=too-many-branches where justified (e.g. dispatch tables), fix the rest
  • Run full test suite (pytest) — confirm 1420+ tests still pass (no production code changes required for this issue)
  • Push to PR branch — confirm new CI jobs (complexity, updated lint-core) go green
  • Confirm ci-pass required status check still passes after adding complexity to its needs list
  • Update CONTRIBUTING.md with complexity threshold documentation and make complexity instructions
  • Update MEMORY.md to document complexity tooling and the known hotspot files

Out of Scope

  • Refactoring existing high-complexity functions — separate work; this issue only adds the gate
  • mutmut — deferred; will be revisited when coverage exceeds 95% consistently
  • Applying complexity checks to parser-free — minimal logic, low value

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions