Skip to content

Improve area weight solver: robustness, memory, and testing#470

Merged
martinholmer merged 10 commits intomasterfrom
pr1-solver-robustness
Mar 25, 2026
Merged

Improve area weight solver: robustness, memory, and testing#470
martinholmer merged 10 commits intomasterfrom
pr1-solver-robustness

Conversation

@donboyd5
Copy link
Collaborator

@donboyd5 donboyd5 commented Mar 25, 2026

Improve area weight solver: robustness, memory, and testing

This is PR 1 of 4 adding congressional district (CD) weighting to TMD. The
PRs are stacked and should be reviewed/merged in order. None of the PRs changes
national code or results. There are changes to Makefile to exclude certain new
tests that relate to areas and may be best run on an as-needed basis.

  1. Solver robustness (this PR) — This improves the area weight solution
    infrastructure so that it is more flexible and efficient. This provides
    modest benefits for state target creation and weight solution, and will
    provide substantial benefits when we prepare targets and solve weights for
    436 Congressional Districts (including DC). The changes include memory
    reduction, feasibility checks, per-constraint penalties, and new tests.
  2. Spec-based target pipeline — CSV-driven target specification, SOI CD
    data ingestion, geographic shares. New files, no state impact.
  3. Quality report enhancements — CD-aware reporting, violation summaries,
    human-readable labels.
  4. Congressional district pipeline — CD solver integration, developer
    mode, override YAML, 436-CD batch solving.

Summary of this PR

Improve the area weight QP solver with robustness enhancements, major memory
reductions, and new test coverage. These changes benefit states and, in the
future, congressional districts.

Solver robustness (3 improvements):

  • Range-based target filtering: _drop_impossible_targets() now checks
    whether each target is achievable within multiplier bounds, not just whether
    the constraint matrix row is all zeros. Catches geometrically unreachable
    targets that the old check missed.
  • LP feasibility pre-check: New _check_feasibility() runs a fast linear
    program (scipy HiGHS) before the QP to identify which constraints will need
    slack. Runs on every area solve (not just development). Diagnostic only —
    logs which constraints are tight but does not change solutions.
  • Per-constraint slack penalties: New _assign_slack_penalties() gives
    reduced penalty (1e3 vs 1e6) to inherently noisy targets: e02400/e00300/e26270
    amounts in low-AGI bins, and filing-status counts in the lowest bins. The
    solver relaxes these targets in preference to distorting weights globally to
    meet targets.

Memory reductions (3 changes, net -36% vs master):

The PR reduces memory usage, especially per-worker usage, to make it practical
to use more workers on multi-processor systems:

  • Build constraint matrix B directly in sparse COO format, eliminating two dense
    intermediates (~620 MB saved per worker).
  • Use sparse matrices in LP feasibility check (~1.2 GB saved per worker).
  • Trim unused TMD DataFrame columns (109 to ~30) and preload TMD in the parent
    process before forking workers (shared via copy-on-write).

Peak memory per worker: 1,244 MB (master) reduced to 798 MB (this PR). With 16
workers: ~20 GB reduced to ~13 GB.

New infrastructure:

  • solver_overrides.py: YAML-based per-area solver parameter management.
    Provides infrastructure for customizing solver settings (tolerance, multiplier
    bounds, etc.) per area. No override files are included in this PR; actual
    per-area overrides will be generated and committed in a later PR when the
    congressional district pipeline is added.

New tests:

  • test_state_weight_results.py: post-solve validation of state weight files
    (existence, nonnegativity, no NaN, correct columns, target accuracy within
    tolerance). Run as part of the test suite if weight files exist.
  • test_fingerprint.py: on-demand reproducibility test. Rounds weights to
    integers, sums per area, and hashes. Detects any change in results across runs
    or machines. Not part of make test; run manually with pytest tests/test_fingerprint.py -v.

Impact on state weights

State weight results will change numerically due to per-constraint slack
penalties. This is expected and is an improvement — noisy low-AGI targets that
previously forced weight distortion across all records are now relaxed
preferentially. The constraint tolerance (0.5%) and multiplier bounds (0-25x)
are unchanged.

Files changed (9 files, +998 / -61)

File Change
tmd/areas/create_area_weights.py Sparse matrix construction, enhanced feasibility check, LP pre-check, per-constraint penalties
tmd/areas/batch_weights.py Parent-process TMD preloading, override support, per-constraint penalties wiring
tmd/areas/solver_overrides.py NEW — YAML-based per-area override management
tmd/areas/solve_weights.py Minor CLI scope fix
tmd/areas/quality_report.py Fix --scope states CLI parsing
tests/test_state_weight_results.py NEW — post-solve state weight validation
tests/test_fingerprint.py NEW — on-demand reproducibility test
tests/conftest.py Add --update-fingerprint CLI option
Makefile Exclude fingerprint test from make test

Test plan

make format                                                    # no changes
make lint                                                      # passes clean
make clean && make data                                        # build TMD + run all tests
python -m tmd.areas.prepare_targets --scope states             # generate state target files
python -m pytest tests/test_prepare_targets.py -v              # verify targets
python -m tmd.areas.solve_weights --scope states --workers 16  # solve state weights
python -m pytest tests/test_state_weight_results.py -v         # verify weights
python -m tmd.areas.quality_report --scope states              # quality report
pytest tests/test_fingerprint.py -v                            # verify reproducibility

Reproducibility

Verified: 8-worker and 16-worker solves produce identical fingerprints (hash
8b36ae1c2ee0c384, integer weight sums match per area exactly).

Prepared by @donboyd5 and Claude Code

donboyd5 and others added 6 commits March 24, 2026 11:46
…enalties, LP pre-check

Three robustness enhancements for the area weight QP solver:

- _drop_impossible_targets() checks achievable range within multiplier
  bounds, not just all-zeros rows
- _check_feasibility() LP pre-check (scipy HiGHS) before QP identifies
  constraints needing slack
- _assign_slack_penalties() gives reduced penalty to inherently noisy
  targets in low-AGI bins

Also adds solver_overrides.py for YAML-based per-area parameter management.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_state_weight_results.py: validates weight file existence,
  nonnegativity, no NaN/inf, solver status, and target accuracy
  for 5 representative states (AL, CA, MN, NY, TX)
- Tests skip gracefully if weight files not yet generated
- Target accuracy allows 0.05% margin above solver tolerance for
  floating-point differences in weight-file roundtrip
- Remove "Pass 1:" label from solve_weights print (only one pass)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --scope states argument was being split into ["STATES"] instead
of being treated as the default all-states scope. Now recognized
as a keyword that maps to None (all states).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build constraint matrix B directly in sparse COO format instead of
constructing dense A (310 MB) and B (310 MB) intermediates. Use sparse
row iteration in _drop_impossible_targets() and sparse LP construction
in _check_feasibility().

Peak memory per worker: 1,244 MB (master) → 798 MB (this commit).
With 16 workers: ~20 GB → ~13 GB, preventing OOM on WSL2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two complementary memory optimizations:

1. Column trimming in _load_taxcalc_data(): drop unused columns from
   TMD DataFrame (109 → ~30 columns), saving ~150 MB per worker.
   Uses pattern matching (e*, c*, p* prefixes) so new target variables
   are automatically retained.

2. Parent-process preloading in batch_weights.py: load TMD once before
   forking workers instead of once per worker. On Linux, fork shares
   memory pages copy-on-write, saving ~150 MB × (num_workers - 1).

Combined with the previous sparse matrix commit, peak memory per worker
drops from ~1.8 GB (PR1 before fixes) to ~0.8 GB. With 16 workers
this is ~13 GB vs ~29 GB, preventing OOM on WSL2.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simple hash-based test: for each area, rounds weights to integers and
sums them. Hash of per-area sums catches any change in results.

Run manually (not part of make test):
  pytest tests/test_fingerprint.py -v --update-fingerprint  # save reference
  pytest tests/test_fingerprint.py -v                        # compare

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@donboyd5 donboyd5 requested a review from martinholmer March 25, 2026 13:12
Commit the states_fingerprint.json so reviewers can verify their
solve results match: pytest tests/test_fingerprint.py -v

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@donboyd5
Copy link
Collaborator Author

donboyd5 commented Mar 25, 2026

@martinholmer, I would much appreciate it if you review this PR and run the test plan to make sure all works and passes fingerprint reproducibility test:

make format                                                    # no changes
make lint                                                      # passes clean
make clean && make data                                        # build TMD + run all tests
python -m tmd.areas.prepare_targets --scope states             # generate state target files
python -m pytest tests/test_prepare_targets.py -v              # verify targets
python -m tmd.areas.solve_weights --scope states --workers 16  # solve state weights
python -m pytest tests/test_state_weight_results.py -v         # verify weights
python -m tmd.areas.quality_report --scope states              # quality report
pytest tests/test_fingerprint.py -v                            # verify reproducibility

@martinholmer
Copy link
Collaborator

@donboyd5, Thanks for all the improvements in PR #470.
I have commitments this morning and early afternoon, but hope to test this PR later this (Wed) afternoon.

@donboyd5
Copy link
Collaborator Author

@donboyd5, Thanks for all the improvements in PR #470. I have commitments this morning and early afternoon, but hope to test this PR later this (Wed) afternoon.

Thanks, @martinholmer!

…ipeline

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@martinholmer
Copy link
Collaborator

@donboyd5, Why is the PR_MESSAGE.md file included in TMD PR #470?

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@donboyd5
Copy link
Collaborator Author

@donboyd5, Why is the PR_MESSAGE.md file included in TMD PR #470?

I'm sorry about that, @martinholmer. It's removed now, by commit eacea23.

@martinholmer
Copy link
Collaborator

@donboyd5, All the tests pass on my computer, but the state quality report results seem worse than they used to be. Maybe I don't understand what is expected by the differences below seem to be worse, not better.

(base) TMD> cat states.sh
#!/bin/zsh
make format
make lint
make clean && make test
python -m tmd.areas.prepare_targets --scope states
python -m pytest tests/test_prepare_targets.py -v
python -m tmd.areas.solve_weights --scope states --workers 8
python -m pytest tests/test_state_weight_results.py -v  # verify weights
python -m tmd.areas.quality_report --scope states > states.act.            # SAVE QUALITY REPORT
pytest tests/test_fingerprint.py -v

(base) TMD> ./states.sh
---[SNIP ALL TESTS PASSING]---

(base) TMD> diff states.act states.exp  # WHERE states.exp FROM PRE-PR470 CODE
8c8
< States with violated targets: 29/51
---
> States with violated targets: 17/51
10c10
< Total violated targets: 126
---
> Total violated targets: 35
15c15
<   Hit rate:  avg=98.6%, min=94.4% (out of 178 targets, tolerance: +/-0.5% + eps)
---
>   Hit rate:  avg=99.6%, min=97.8% (out of 178 targets, tolerance: +/-0.5% + eps)
33,35c33,35
< AK   Solved           169   178     9   0.0048   0.0050   0.683   0.000   0.933   1.741     25.0  13.4%
< AL   Solved           178   179     1   0.0047   0.0050   0.401   0.000   0.911   1.376      9.2   5.0%
< AR   Solved           177   179     2   0.0048   0.0050   0.548   0.000   0.897   1.406     25.0   8.0%
---
> AK   Solved           175   178     3   0.0048   0.0050   0.683   0.000   0.933   1.741     25.0  13.4%
> AL   Solved           179   179     0   0.0047   0.0050   0.401   0.000   0.911   1.376      9.2   5.0%
> AR   Solved           179   179     0   0.0048   0.0050   0.548   0.000   0.897   1.406     25.0   8.0%
39,41c39,41
< CT   Solved           178   179     1   0.0047   0.0050   0.954   0.000   0.998   2.459     25.0   5.9%
< DC   Solved           168   178    10   0.0048   0.0050   1.219   0.000   1.012   3.237     25.0   9.9%
< DE   Solved           174   178     4   0.0047   0.0050   0.560   0.000   0.972   1.730     20.2  10.0%
---
> CT   Solved           179   179     0   0.0047   0.0050   0.954   0.000   0.998   2.459     25.0   5.9%
> DC   Solved           174   178     4   0.0048   0.0050   1.219   0.000   1.012   3.237     25.0   9.9%
> DE   Solved           177   178     1   0.0047   0.0050   0.560   0.000   0.972   1.730     20.2  10.0%
44,46c44,46
< HI   Solved           173   179     6   0.0046   0.0050   0.741   0.000   0.958   1.553     25.0  13.2%
< IA   Solved           177   179     2   0.0048   0.0050   0.614   0.000   0.946   1.485     25.0  11.3%
< ID   Solved           174   179     5   0.0048   0.0050   0.393   0.156   0.943   1.454     10.6   3.3%
---
> HI   Solved           177   179     2   0.0046   0.0050   0.741   0.000   0.958   1.553     25.0  13.2%
> IA   Solved           179   179     0   0.0048   0.0050   0.614   0.000   0.946   1.485     25.0  11.3%
> ID   Solved           179   179     0   0.0048   0.0050   0.393   0.156   0.943   1.454     10.6   3.3%
49,51c49,51
< KS   Solved           177   179     2   0.0048   0.0050   0.546   0.000   0.969   1.417     25.0   7.7%
< KY   Solved           178   179     1   0.0048   0.0050   0.385   0.000   0.933   1.279     17.4   5.4%
< LA   Solved           178   179     1   0.0046   0.0050   0.554   0.000   0.881   1.552     25.0   8.3%
---
> KS   Solved           178   179     1   0.0048   0.0050   0.546   0.000   0.969   1.417     25.0   7.7%
> KY   Solved           179   179     0   0.0048   0.0050   0.385   0.000   0.933   1.279     17.4   5.4%
> LA   Solved           179   179     0   0.0046   0.0050   0.554   0.000   0.881   1.552     25.0   8.3%
54c54
< ME   Solved           173   179     6   0.0047   0.0050   0.757   0.000   0.968   1.502     25.0  15.4%
---
> ME   Solved           177   179     2   0.0047   0.0050   0.757   0.000   0.968   1.502     25.0  15.4%
56c56
< MN   Solved           178   179     1   0.0047   0.0050   0.438   0.258   0.986   1.568     22.4   2.8%
---
> MN   Solved           179   179     0   0.0047   0.0050   0.438   0.258   0.986   1.568     22.4   2.8%
58,59c58,59
< MS   Solved           174   179     5   0.0047   0.0050   0.669   0.000   0.824   1.659     25.0  15.3%
< MT   Solved           172   179     7   0.0048   0.0050   0.545   0.000   1.001   1.624     25.0   7.9%
---
> MS   Solved           177   179     2   0.0047   0.0050   0.669   0.000   0.824   1.659     25.0  15.3%
> MT   Solved           177   179     2   0.0048   0.0050   0.545   0.000   1.001   1.624     25.0   7.9%
61,63c61,63
< ND   Solved           171   178     7   0.0047   0.0050   1.045   0.000   0.899   2.097     25.0  23.6%
< NE   Solved           174   179     5   0.0048   0.0050   0.626   0.000   0.961   1.553     25.0   9.2%
< NH   Solved           174   179     5   0.0047   0.0050   0.842   0.000   0.977   2.244     25.0   9.0%
---
> ND   Solved           176   178     2   0.0047   0.0050   1.045   0.000   0.899   2.097     25.0  23.6%
> NE   Solved           177   179     2   0.0048   0.0050   0.626   0.000   0.961   1.553     25.0   9.2%
> NH   Solved           178   179     1   0.0047   0.0050   0.842   0.000   0.977   2.244     25.0   9.0%
65,66c65,66
< NM   Solved           173   179     6   0.0048   0.0050   0.565   0.000   0.939   1.331     25.0  11.5%
< NV   Solved           178   179     1   0.0046   0.0050   0.592   0.000   1.006   1.652     25.0   5.7%
---
> NM   Solved           177   179     2   0.0048   0.0050   0.565   0.000   0.939   1.331     25.0  11.5%
> NV   Solved           179   179     0   0.0046   0.0050   0.592   0.000   1.006   1.652     25.0   5.7%
69,70c69,70
< OK   Solved           178   179     1   0.0047   0.0050   0.400   0.000   0.923   1.320     13.3   5.2%
< OR   Solved           177   179     2   0.0046   0.0050   0.359   0.246   0.982   1.400      8.3   2.4%
---
> OK   Solved           179   179     0   0.0047   0.0050   0.400   0.000   0.923   1.320     13.3   5.2%
> OR   Solved           179   179     0   0.0046   0.0050   0.359   0.246   0.982   1.400      8.3   2.4%
72c72
< RI   Solved           173   179     6   0.0048   0.0050   0.488   0.014   1.000   1.478     25.0   4.8%
---
> RI   Solved           178   179     1   0.0048   0.0050   0.488   0.014   1.000   1.478     25.0   4.8%
74c74
< SD   Solved           173   179     6   0.0048   0.0050   0.857   0.000   0.961   1.751     25.0  15.7%
---
> SD   Solved           178   179     1   0.0048   0.0050   0.857   0.000   0.961   1.751     25.0  15.7%
77c77
< UT   Solved           177   179     2   0.0047   0.0050   0.520   0.020   0.968   1.574     25.0   4.6%
---
> UT   Solved           179   179     0   0.0047   0.0050   0.520   0.020   0.968   1.574     25.0   4.6%
79c79
< VT   Solved           170   178     8   0.0047   0.0050   1.012   0.000   0.948   1.728     25.0  21.9%
---
> VT   Solved           175   178     3   0.0047   0.0050   1.012   0.000   0.948   1.728     25.0  21.9%
82,83c82,83
< WV   Solved           174   179     5   0.0048   0.0050   0.570   0.000   0.876   1.362     25.0  15.0%
< WY   Solved           170   179     9   0.0048   0.0050   2.014   0.000   0.913   2.156     25.0  27.7%
---
> WV   Solved           177   179     2   0.0048   0.0050   0.570   0.000   0.876   1.362     25.0  15.0%
> WY   Solved           175   179     4   0.0048   0.0050   2.014   0.000   0.913   2.156     25.0  27.7%
86c86
<   c00100: 126 violations across 29 states
---
>   c00100: 35 violations across 17 states
89,98c89,98
<   DC: 10 violated
<   AK: 9 violated
<   WY: 9 violated
<   VT: 8 violated
<   ND: 7 violated
<   MT: 7 violated
<   HI: 6 violated
<   SD: 6 violated
<   RI: 6 violated
<   NM: 6 violated
---
>   WY: 4 violated
>   DC: 4 violated
>   VT: 3 violated
>   AK: 3 violated
>   ME: 2 violated
>   MS: 2 violated
>   MT: 2 violated
>   HI: 2 violated
>   NE: 2 violated
>   NM: 2 violated
104,108c104,108
<   VT   0.500% target=      34,591  achieved=      34,418  miss=     173  c00100 returns single $0K-$10K
<   AK   0.500% target=      33,139  achieved=      32,973  miss=     166  c00100 returns single $0K-$10K
<   LA   0.500% target=      25,514  achieved=      25,387  miss=     127  c00100 returns HoH $0K-$10K
<   DC   0.500% target=      23,050  achieved=      23,165  miss=     115  c00100 returns single $0K-$10K
<   MS   0.500% target=      16,459  achieved=      16,377  miss=      82  c00100 returns HoH $0K-$10K
---
>   DC   0.500% target=       3,392  achieved=       3,409  miss=      17  c00100 returns $1000K+
>   MT   0.500% target=       2,370  achieved=       2,358  miss=      12  c00100 returns $1000K+
>   HI   0.500% target=       2,013  achieved=       2,003  miss=      10  c00100 returns $1000K+
>   ME   0.500% target=       2,013  achieved=       2,003  miss=      10  c00100 returns $1000K+
>   NM   0.500% target=       2,125  achieved=       2,115  miss=      10  c00100 returns $1000K+

I don't understand why the number of "violated targets" is up from 35 to 126.

States are well-conditioned — the solver hits all targets with uniform
penalties. Per-constraint penalties were designed for CDs where extreme
areas need relaxation. For states, they cause more targets to land on
the wrong side of the 0.50% tolerance boundary without changing the
actual weights (RMSE, objective, multipliers all identical to 6+ digits).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@donboyd5
Copy link
Collaborator Author

Good catch, @martinholmer. The differences are superficial — the actual weights are identical to within 2e-8 and the RMSE, objective function, and multiplier distributions are the same to 6+ significant figures. The "extra" violations occur because more targets land just above the 0.50% tolerance boundary rather than just below it.

The cause was a new per-constraint slack penalty feature designed for congressional districts, where extreme areas (e.g., Manhattan) need the solver to preferentially relax certain noisy targets. We applied it to states unnecessarily. The fix (just pushed) restricts per-constraint penalties to CDs only. State results now match the pre-PR baseline exactly: 17 areas with violations, 35 violated targets.

Note that in PR 2 we will make minor changes to state targets.

@martinholmer martinholmer marked this pull request as draft March 25, 2026 19:36
@martinholmer martinholmer marked this pull request as ready for review March 25, 2026 19:54
@martinholmer
Copy link
Collaborator

@donboyd5 said in PR #470

The cause was a new per-constraint slack penalty feature designed for congressional districts, where extreme areas (e.g., Manhattan) need the solver to preferentially relax certain noisy targets. We applied it to states unnecessarily. The fix (just pushed) restricts per-constraint penalties to CDs only. State results now match the pre-PR baseline exactly: 17 areas with violations, 35 violated targets.

OK, now I get the 35 violated targets. Thanks.

@martinholmer martinholmer merged commit 1192ffd into master Mar 25, 2026
1 check passed
@donboyd5 donboyd5 deleted the pr1-solver-robustness branch March 25, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants