Conversation
|
@martinholmer, I am away tomorrow and the next day but can address any comments or questions you have after that. The documentation for this draft PR still needs work. |
|
@donboyd5, Here are the results on my computer after pulling the branch for PR #465 and then the branch for PR #466: |
|
@donboyd5, Looks like PR #466 is still a work in progress. One example of its incomplete nature is that the Can you do this:
As in the national |
Port state weight solver pipeline from state-weights-clarabel branch: - Clarabel constrained QP solver with elastic slack (0.5% tolerance) - Parallel batch runner with worker-cached TMD data - Cross-state quality report (log parsing, weight exhaustion, aggregation) - Standalone CLI: python -m tmd.areas.solve_weights --scope states --workers 8 - 11 tests covering solver, log parser, scope parsing, and area filtering Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add weight_penalty parameter to QP solver for controlling multiplier-vs-constraint tradeoff - Add --max-exhaustion flag to solve_weights for iterative two-pass exhaustion limiting with per-record multiplier caps - Enhance quality report with exhaustion record profiles (top 5 most exhausted records with taxpayer characteristics) - Add sweep_params.py for grid search over multiplier_max and weight_penalty combinations Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lower AREA_MULTIPLIER_MAX from 100 to 25 based on parameter sweep: virtually identical target accuracy (35 vs 33 violations) but 34% lower max exhaustion (16.6x vs 25.2x). Single-pass, no complexity. Add AREA_WEIGHTING_LESSONS.md documenting parameter tuning findings, weight exhaustion mechanics, dual variable analysis, SALT targeting, and guidance for future Congressional district work. Update README.md with solver usage, quality report, and link to lessons document. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Quality report now checks ~19 untargeted variables for cross-state aggregation distortion, sorted by severity, with >2% flagged. Key bystanders: student loan interest (-10.5%), AMT (-10.3%), tax-exempt interest (+7.4%), qualified dividends (-4.1%). Add corresponding section to AREA_WEIGHTING_LESSONS.md explaining what drives bystander distortion and when to worry about it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Remove create_area_weights_clarabel.py and replace the old create_area_weights.py (scipy L-BFGS-B + JAX) with the Clarabel constrained QP solver. Update all imports and function references. - Renamed create_area_weights_file_clarabel() to create_area_weights_file() - Removed valid_area() dependency (areas validated by target file existence) - Updated imports in solve_weights, batch_weights, quality_report, sweep_params, make_all, and test_solve_weights Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Definitely, thanks.
…On Thu, Mar 19, 2026 at 6:13 PM Martin Holmer ***@***.***> wrote:
*martinholmer* left a comment (PSLmodels/tax-microdata-benchmarking#466)
<#466 (comment)>
@donboyd5 <https://github.com/donboyd5>, Don't merge #466
<#466> until
you do the cleanup of the old code.
—
Reply to this email directly, view it on GitHub
<#466?email_source=notifications&email_token=ABR4JGGAT7LUZ3JA2LKFT5D4RRWJ7A5CNFSNUABFM5UWIORPF5TWS5BNNB2WEL2JONZXKZKDN5WW2ZLOOQXTIMBZGM3DENJTGIYKM4TFMFZW63VHNVSW45DJN5XKKZLWMVXHJLDGN5XXIZLSL5RWY2LDNM#issuecomment-4093625320>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABR4JGFT4GCGXBNN5G27SML4RRWJ7AVCNFSM6AAAAACWYJTV72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHM2DAOJTGYZDKMZSGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
test_area_weights.py tested the old scipy L-BFGS-B solver which is now replaced by Clarabel. The Clarabel solver is tested by test_solve_weights.py (test_clarabel_solver_xx and 10 other tests). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rename test_clarabel_solver_xx to test_solver_xx since the module name no longer contains "clarabel". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fcc479c to
d16a9a9
Compare
|
@martinholmer, thanks for speedy review and helpful comments. The state weights quality reports on your machine was identical to the report on my machine, to every last number. It's nice to see that reproducibility held up with subnational weights also (as we would expect given that it's the same solver). |
Summary
Add Clarabel QP solver pipeline for state weight optimization — the solver counterpart to the target preparation in PR #465, which this depends upon.
python -m tmd.areas.solve_weights --scope states --workers 8Running the optimizer
From scratch:
To re-solve weights only (if TMD data and targets already exist):
Key design decisions
multiplier_max=25: The "multiplier" is the ratio of a record's chosen (optimal) weight for an area divided by the weight it would have if its area had weights proportionate to their population.For example, if a record has a national weight of 200 and we reweight it to reflect state A, which has 10% of the nation's population, its proportionate weight would be 20 -- 10% of 200. If this record type is very common in state A, perhaps its optimal weight might be 40. Its multiplier would be 2.0 -- the optimal weight divided by proportionate weight. We limit this multiplier, setting its maximum value to 25 -- the maximum optimal weight for this record would be 500. We based the 25 maximum on a 12-combination parameter sweep (4 × 3 grid) that examines combinations of the multiplier maximum and the relative importance put on weight changes vs. target violations. With the 25 maximum, the extent to which weights were "overused", where a record's sum of weights across areas is larger than its national weight, was relatively low and target misses were minimal.
weight_penaltyhas no effect: Weight_penalty has no effect: The solver balances two goals — keeping weights close to proportional and hitting targets. The weight_penalty parameter controls the relative importance of these goals. Our sweep showed it only increases violations without changing weight structure or exhaustion. The solver reaches the same solution regardless of penalty weight.Single-pass preferred: Two-pass iterative exhaustion limiting was tested but found too aggressive — proportional cap scaling made targets infeasible (8,979 violations). Tighter
multiplier_maxin a single pass is simpler and more robust.Filing-status counts excluded from $1M+ bin: Dual variable analysis showed these are the only expensive constraints (dual costs 6–8 orders of magnitude above all others). Removing them eliminated virtually all constraint cost.
Bystander variable check
Untargeted variables checked for cross-state aggregation distortion. Most are well-behaved (<1%), but a few show >2% distortion: student loan interest (-10.5%), AMT (-10.3%), tax-exempt interest (+7.4%), qualified dividends (-4.1%). These are driven by rare high-income PUF records being over/under-weighted. Documented in lessons with guidance on when to worry.
New files
tmd/areas/create_area_weights_clarabel.pytmd/areas/batch_weights.pytmd/areas/quality_report.pytmd/areas/solve_weights.pytmd/areas/sweep_params.pytmd/areas/AREA_WEIGHTING_LESSONS.mdtests/test_solve_weights.pyTest plan