Improve area weight solver: robustness, memory, and testing#470
Improve area weight solver: robustness, memory, and testing#470martinholmer merged 10 commits intomasterfrom
Conversation
…enalties, LP pre-check Three robustness enhancements for the area weight QP solver: - _drop_impossible_targets() checks achievable range within multiplier bounds, not just all-zeros rows - _check_feasibility() LP pre-check (scipy HiGHS) before QP identifies constraints needing slack - _assign_slack_penalties() gives reduced penalty to inherently noisy targets in low-AGI bins Also adds solver_overrides.py for YAML-based per-area parameter management. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_state_weight_results.py: validates weight file existence, nonnegativity, no NaN/inf, solver status, and target accuracy for 5 representative states (AL, CA, MN, NY, TX) - Tests skip gracefully if weight files not yet generated - Target accuracy allows 0.05% margin above solver tolerance for floating-point differences in weight-file roundtrip - Remove "Pass 1:" label from solve_weights print (only one pass) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The --scope states argument was being split into ["STATES"] instead of being treated as the default all-states scope. Now recognized as a keyword that maps to None (all states). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Build constraint matrix B directly in sparse COO format instead of constructing dense A (310 MB) and B (310 MB) intermediates. Use sparse row iteration in _drop_impossible_targets() and sparse LP construction in _check_feasibility(). Peak memory per worker: 1,244 MB (master) → 798 MB (this commit). With 16 workers: ~20 GB → ~13 GB, preventing OOM on WSL2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two complementary memory optimizations: 1. Column trimming in _load_taxcalc_data(): drop unused columns from TMD DataFrame (109 → ~30 columns), saving ~150 MB per worker. Uses pattern matching (e*, c*, p* prefixes) so new target variables are automatically retained. 2. Parent-process preloading in batch_weights.py: load TMD once before forking workers instead of once per worker. On Linux, fork shares memory pages copy-on-write, saving ~150 MB × (num_workers - 1). Combined with the previous sparse matrix commit, peak memory per worker drops from ~1.8 GB (PR1 before fixes) to ~0.8 GB. With 16 workers this is ~13 GB vs ~29 GB, preventing OOM on WSL2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Simple hash-based test: for each area, rounds weights to integers and sums them. Hash of per-area sums catches any change in results. Run manually (not part of make test): pytest tests/test_fingerprint.py -v --update-fingerprint # save reference pytest tests/test_fingerprint.py -v # compare Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Commit the states_fingerprint.json so reviewers can verify their solve results match: pytest tests/test_fingerprint.py -v Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@martinholmer, I would much appreciate it if you review this PR and run the test plan to make sure all works and passes fingerprint reproducibility test: |
Thanks, @martinholmer! |
…ipeline Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
I'm sorry about that, @martinholmer. It's removed now, by commit eacea23. |
|
@donboyd5, All the tests pass on my computer, but the state quality report results seem worse than they used to be. Maybe I don't understand what is expected by the differences below seem to be worse, not better. I don't understand why the number of "violated targets" is up from 35 to 126. |
States are well-conditioned — the solver hits all targets with uniform penalties. Per-constraint penalties were designed for CDs where extreme areas need relaxation. For states, they cause more targets to land on the wrong side of the 0.50% tolerance boundary without changing the actual weights (RMSE, objective, multipliers all identical to 6+ digits). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Good catch, @martinholmer. The differences are superficial — the actual weights are identical to within 2e-8 and the RMSE, objective function, and multiplier distributions are the same to 6+ significant figures. The "extra" violations occur because more targets land just above the 0.50% tolerance boundary rather than just below it. The cause was a new per-constraint slack penalty feature designed for congressional districts, where extreme areas (e.g., Manhattan) need the solver to preferentially relax certain noisy targets. We applied it to states unnecessarily. The fix (just pushed) restricts per-constraint penalties to CDs only. State results now match the pre-PR baseline exactly: 17 areas with violations, 35 violated targets. Note that in PR 2 we will make minor changes to state targets. |
OK, now I get the 35 violated targets. Thanks. |
Improve area weight solver: robustness, memory, and testing
Summary of this PR
Improve the area weight QP solver with robustness enhancements, major memory
reductions, and new test coverage. These changes benefit states and, in the
future, congressional districts.
Solver robustness (3 improvements):
_drop_impossible_targets()now checkswhether each target is achievable within multiplier bounds, not just whether
the constraint matrix row is all zeros. Catches geometrically unreachable
targets that the old check missed.
_check_feasibility()runs a fast linearprogram (scipy HiGHS) before the QP to identify which constraints will need
slack. Runs on every area solve (not just development). Diagnostic only —
logs which constraints are tight but does not change solutions.
_assign_slack_penalties()givesreduced penalty (1e3 vs 1e6) to inherently noisy targets: e02400/e00300/e26270
amounts in low-AGI bins, and filing-status counts in the lowest bins. The
solver relaxes these targets in preference to distorting weights globally to
meet targets.
Memory reductions (3 changes, net -36% vs master):
The PR reduces memory usage, especially per-worker usage, to make it practical
to use more workers on multi-processor systems:
intermediates (~620 MB saved per worker).
process before forking workers (shared via copy-on-write).
Peak memory per worker: 1,244 MB (master) reduced to 798 MB (this PR). With 16
workers: ~20 GB reduced to ~13 GB.
New infrastructure:
solver_overrides.py: YAML-based per-area solver parameter management.Provides infrastructure for customizing solver settings (tolerance, multiplier
bounds, etc.) per area. No override files are included in this PR; actual
per-area overrides will be generated and committed in a later PR when the
congressional district pipeline is added.
New tests:
test_state_weight_results.py: post-solve validation of state weight files(existence, nonnegativity, no NaN, correct columns, target accuracy within
tolerance). Run as part of the test suite if weight files exist.
test_fingerprint.py: on-demand reproducibility test. Rounds weights tointegers, sums per area, and hashes. Detects any change in results across runs
or machines. Not part of
make test; run manually withpytest tests/test_fingerprint.py -v.Impact on state weights
State weight results will change numerically due to per-constraint slack
penalties. This is expected and is an improvement — noisy low-AGI targets that
previously forced weight distortion across all records are now relaxed
preferentially. The constraint tolerance (0.5%) and multiplier bounds (0-25x)
are unchanged.
Files changed (9 files, +998 / -61)
tmd/areas/create_area_weights.pytmd/areas/batch_weights.pytmd/areas/solver_overrides.pytmd/areas/solve_weights.pytmd/areas/quality_report.pytests/test_state_weight_results.pytests/test_fingerprint.pytests/conftest.pyMakefilemake testTest plan
Reproducibility
Verified: 8-worker and 16-worker solves produce identical fingerprints (hash
8b36ae1c2ee0c384, integer weight sums match per area exactly).Prepared by @donboyd5 and Claude Code