Skip to content

Add Python state target preparation pipeline#465

Merged
donboyd5 merged 1 commit intomasterfrom
prepare-state-targets
Mar 19, 2026
Merged

Add Python state target preparation pipeline#465
donboyd5 merged 1 commit intomasterfrom
prepare-state-targets

Conversation

@donboyd5
Copy link
Collaborator

@donboyd5 donboyd5 commented Mar 19, 2026

Implement a python pipeline that derives per-state target files from IRS SOI and other geographic shares, applied to TMD national totals.

Pipeline:

  • prepare_targets.py: CLI entry point
    python -m tmd.areas.prepare_targets --scope states
  • prepare/soi_state_data.py: read and process raw SOI state CSVs
  • prepare/target_sharing.py: compute TMD × SOI shares with OA rescaling
  • prepare/target_file_writer.py: expand JSON recipe into per-state CSVs
  • prepare/extended_targets.py: Census SALT, SOI credit, and additional variable targets using external geographic distribution data
  • prepare/constants.py: AGI bins, variable mappings, state metadata
  • prepare/census_population.py: embedded Census state population data

Directory restructure:

  • SOI state data moved to prepare/data/soi_states/
  • Recipes moved to prepare/recipes/
  • Old prepare_states/ infrastructure removed

Key design choices:

  • SOI shares rescaled so 51 states sum to 1.0 (excludes "Other Areas")
  • Filing-status count targets excluded from $1M+ AGI bin (dual variable analysis showed these are the dominant source of weight distortion)
  • Extended targets use Census S&L finance data for SALT distribution and SOI credit data for EITC/CTC
  • CD support deferred to a future PR

Target files are not committed — they are fast to regenerate (~4 seconds).

Replace the R/Quarto state target preparation with a Python pipeline
that derives per-state target files from IRS SOI geographic shares
and TMD national totals.

Pipeline:
- prepare_targets.py: CLI entry point (python -m tmd.areas.prepare_targets)
- prepare/soi_state_data.py: read and process raw SOI state CSVs
- prepare/target_sharing.py: compute TMD × SOI shares with OA rescaling
- prepare/target_file_writer.py: expand JSON recipe into per-state CSVs
- prepare/extended_targets.py: Census SALT, SOI credit, and additional
  variable targets using external geographic distribution data
- prepare/constants.py: AGI bins, variable mappings, state metadata
- prepare/census_population.py: embedded Census state population data

Directory restructure:
- SOI state data moved to prepare/data/soi_states/
- Recipes moved to prepare/recipes/
- Old prepare_states/ infrastructure removed

Key design choices:
- SOI shares rescaled so 51 states sum to 1.0 (excludes "Other Areas")
- Filing-status count targets excluded from $1M+ AGI bin (dual variable
  analysis showed these are the dominant source of weight distortion)
- Extended targets use Census S&L finance data for SALT distribution
  and SOI credit data for EITC/CTC
- CD support deferred to a future PR

Target files are not committed — they are fast to regenerate (~4 seconds).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@donboyd5 donboyd5 requested a review from martinholmer March 19, 2026 14:34
@donboyd5 donboyd5 mentioned this pull request Mar 19, 2026
1 task
@martinholmer
Copy link
Collaborator

@donboyd5, How would you suggest reviewing PR #465? It seems incomplete: that is, no use of the targets.

@donboyd5
Copy link
Collaborator Author

@martinholmer said:

@donboyd5, How would you suggest reviewing PR #465? It seems incomplete: that is, no use of the targets.

Would it work to fetch both #465 and #466 and then run the full pipeline?

make clean
make data
python -m tmd.areas.prepare_targets --scope states
python -m tmd.areas.solve_weights --scope states --workers 8  # adjust workers as appropriate
python -m tmd.areas.quality_report

I am away for the next hour but after that could provide more info if you have questions. I'm sorry, the documentation isn't quite up to snuff yet.

@martinholmer martinholmer marked this pull request as draft March 19, 2026 21:58
@martinholmer martinholmer marked this pull request as ready for review March 19, 2026 21:59
@donboyd5 donboyd5 merged commit a5090ad into master Mar 19, 2026
1 check passed
@donboyd5 donboyd5 deleted the prepare-state-targets branch March 19, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants