Add Python state target preparation pipeline#465
Merged
Conversation
Replace the R/Quarto state target preparation with a Python pipeline that derives per-state target files from IRS SOI geographic shares and TMD national totals. Pipeline: - prepare_targets.py: CLI entry point (python -m tmd.areas.prepare_targets) - prepare/soi_state_data.py: read and process raw SOI state CSVs - prepare/target_sharing.py: compute TMD × SOI shares with OA rescaling - prepare/target_file_writer.py: expand JSON recipe into per-state CSVs - prepare/extended_targets.py: Census SALT, SOI credit, and additional variable targets using external geographic distribution data - prepare/constants.py: AGI bins, variable mappings, state metadata - prepare/census_population.py: embedded Census state population data Directory restructure: - SOI state data moved to prepare/data/soi_states/ - Recipes moved to prepare/recipes/ - Old prepare_states/ infrastructure removed Key design choices: - SOI shares rescaled so 51 states sum to 1.0 (excludes "Other Areas") - Filing-status count targets excluded from $1M+ AGI bin (dual variable analysis showed these are the dominant source of weight distortion) - Extended targets use Census S&L finance data for SALT distribution and SOI credit data for EITC/CTC - CD support deferred to a future PR Target files are not committed — they are fast to regenerate (~4 seconds). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Collaborator
Collaborator
Author
|
@martinholmer said:
Would it work to fetch both #465 and #466 and then run the full pipeline? I am away for the next hour but after that could provide more info if you have questions. I'm sorry, the documentation isn't quite up to snuff yet. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement a python pipeline that derives per-state target files from IRS SOI and other geographic shares, applied to TMD national totals.
Pipeline:
python -m tmd.areas.prepare_targets --scope statesDirectory restructure:
Key design choices:
Target files are not committed — they are fast to regenerate (~4 seconds).