PolicyEngine.py is a Python package for tax-benefit microsimulation analysis. It provides a unified interface for running policy simulations, analysing distributional impacts, and visualising results across different countries.
The package is organised around several core concepts:
- Tax-benefit models: Country-specific implementations (UK, US) that define tax and benefit rules
- Datasets: Microdata representing populations at entity level (person, household, etc.)
- Simulations: Execution environments that apply tax-benefit models to datasets
- Outputs: Analysis tools for extracting insights from simulation results
- Policies: Parametric reforms that modify tax-benefit system parameters
Tax-benefit models define the rules and calculations for a country's tax and benefit system. Each model version contains:
- Variables: Calculated values (e.g., income tax, universal credit)
- Parameters: System settings (e.g., personal allowance, benefit rates)
- Parameter values: Time-bound values for parameters
from policyengine.tax_benefit_models.uk import uk_latest
from policyengine.tax_benefit_models.us import us_latest
# UK model includes variables like:
# - income_tax, national_insurance, universal_credit
# - Parameters like personal allowance, NI thresholds
# US model includes variables like:
# - income_tax, payroll_tax, eitc, ctc, snap
# - Parameters like standard deduction, EITC ratesDatasets contain microdata representing a population. Each dataset has:
- Entity-level data: Separate dataframes for person, household, and other entities
- Weights: Survey weights for population representation
- Join keys: Relationships between entities (e.g., which household each person belongs to)
from policyengine.tax_benefit_models.uk import PolicyEngineUKDataset
dataset = PolicyEngineUKDataset(
name="FRS 2023-24",
description="Family Resources Survey microdata",
filepath="./data/frs_2023_24_year_2026.h5",
year=2026,
)
# Access entity-level data
person_data = dataset.data.person # MicroDataFrame
household_data = dataset.data.household
benunit_data = dataset.data.benunit # Benefit unit (UK only)You can create custom datasets for scenario analysis:
import pandas as pd
from microdf import MicroDataFrame
from policyengine.tax_benefit_models.uk import PolicyEngineUKDataset, UKYearData
# Create person data
person_df = MicroDataFrame(
pd.DataFrame({
"person_id": [0, 1, 2],
"person_household_id": [0, 0, 1],
"person_benunit_id": [0, 0, 1],
"age": [35, 8, 40],
"employment_income": [30000, 0, 50000],
"person_weight": [1.0, 1.0, 1.0],
}),
weights="person_weight"
)
# Create household data
household_df = MicroDataFrame(
pd.DataFrame({
"household_id": [0, 1],
"region": ["LONDON", "SOUTH_EAST"],
"rent": [15000, 12000],
"household_weight": [1.0, 1.0],
}),
weights="household_weight"
)
# Create benunit data
benunit_df = MicroDataFrame(
pd.DataFrame({
"benunit_id": [0, 1],
"would_claim_uc": [True, True],
"benunit_weight": [1.0, 1.0],
}),
weights="benunit_weight"
)
dataset = PolicyEngineUKDataset(
name="Custom scenario",
description="Single parent vs single adult",
filepath="./custom.h5",
year=2026,
data=UKYearData(
person=person_df,
household=household_df,
benunit=benunit_df,
)
)Before running simulations, you need representative microdata. The package provides three functions for managing datasets:
ensure_datasets(): Load from disk if available, otherwise download and compute (recommended)create_datasets(): Always download from HuggingFace and compute from scratchload_datasets(): Load previously saved HDF5 files from disk
from policyengine.tax_benefit_models.us import ensure_datasets
# First run: downloads from HuggingFace, computes variables, saves to ./data/
# Subsequent runs: loads from disk instantly
datasets = ensure_datasets(
datasets=["hf://policyengine/policyengine-us-data/enhanced_cps_2024.h5"],
years=[2026],
data_folder="./data",
)
dataset = datasets["enhanced_cps_2024_2026"]from policyengine.tax_benefit_models.uk import ensure_datasets
datasets = ensure_datasets(
datasets=["hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"],
years=[2026],
data_folder="./data",
)
dataset = datasets["enhanced_frs_2023_24_2026"]All datasets are stored as HDF5 files on disk. No database server is required.
Simulations apply tax-benefit models to datasets, calculating all variables for the specified year.
from policyengine.core import Simulation
from policyengine.tax_benefit_models.uk import uk_latest
simulation = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
)
simulation.run()
# Access output data
output_person = simulation.output_dataset.data.person
output_household = simulation.output_dataset.data.household
# Check calculated variables
print(output_household[["household_id", "household_net_income", "household_tax"]])The Simulation class provides two methods for computing results:
| Method | Behaviour |
|---|---|
simulation.run() |
Always recomputes from scratch. No caching. |
simulation.ensure() |
Checks in-memory LRU cache, then tries loading from disk, then falls back to run() + save(). |
# One-off computation (no caching)
simulation.run()
# Cache-or-compute (preferred for production use)
simulation.ensure()ensure() uses a module-level LRU cache (max 100 simulations) and saves output datasets as HDF5 files alongside the input dataset. On repeated calls, it returns cached results instantly. For baseline-vs-reform comparisons, economic_impact_analysis() calls ensure() internally, so you rarely need to call it yourself.
After running a simulation, you can access the calculated variables from the output dataset:
simulation = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
)
simulation.run()
# Access specific variables
output = simulation.output_dataset.data
person_data = output.person[["person_id", "age", "employment_income", "income_tax"]]
household_data = output.household[["household_id", "household_net_income"]]
benunit_data = output.benunit[["benunit_id", "universal_credit", "child_benefit"]]Policies modify tax-benefit system parameters through parametric reforms.
from policyengine.core import Policy, Parameter, ParameterValue
import datetime
# Define parameter to modify
parameter = Parameter(
name="gov.hmrc.income_tax.allowances.personal_allowance.amount",
tax_benefit_model_version=uk_latest,
description="Personal allowance for income tax",
data_type=float,
)
# Set new value
parameter_value = ParameterValue(
parameter=parameter,
start_date=datetime.date(2026, 1, 1),
end_date=datetime.date(2026, 12, 31),
value=15000, # Increase from ~£12,570 to £15,000
)
policy = Policy(
name="Increased personal allowance",
description="Raises personal allowance to £15,000",
parameter_values=[parameter_value],
)# Baseline simulation
baseline = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
)
baseline.run()
# Reform simulation
reform = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
policy=policy,
)
reform.run()Policies can be combined using the + operator:
combined = policy_a + policy_b
# Concatenates parameter_values and chains simulation_modifiersFor reforms that cannot be expressed as parameter value changes, Policy accepts a simulation_modifier callable that directly manipulates the underlying policyengine_core simulation:
def my_modifier(sim):
"""Custom reform logic applied to the core simulation object."""
p = sim.tax_benefit_system.parameters
# Modify parameters programmatically
return sim
policy = Policy(
name="Custom reform",
simulation_modifier=my_modifier,
)Note: the UK model supports simulation_modifier. The US model currently only uses the parameter_values path.
The Dynamic class is structurally identical to Policy and represents behavioural responses to policy changes (e.g., labour supply elasticities). It is applied after the policy in the simulation pipeline.
from policyengine.core.dynamic import Dynamic
dynamic = Dynamic(
name="Labour supply response",
parameter_values=[...], # Same format as Policy
)
simulation = Simulation(
dataset=dataset,
tax_benefit_model_version=uk_latest,
policy=policy,
dynamic=dynamic,
)Dynamic responses can also be combined using the + operator and support simulation_modifier callables.
Output classes provide structured analysis of simulation results.
Calculate aggregate statistics (sum, mean, count) for any variable:
from policyengine.outputs.aggregate import Aggregate, AggregateType
# Total universal credit spending
agg = Aggregate(
simulation=simulation,
variable="universal_credit",
aggregate_type=AggregateType.SUM,
entity="benunit", # Map to benunit level
)
agg.run()
print(f"Total UC spending: £{agg.result / 1e9:.1f}bn")
# Mean household income in top decile
agg = Aggregate(
simulation=simulation,
variable="household_net_income",
aggregate_type=AggregateType.MEAN,
filter_variable="household_net_income",
quantile=10,
quantile_eq=10, # 10th decile
)
agg.run()
print(f"Mean income in top decile: £{agg.result:,.0f}")Analyse impacts of policy reforms:
from policyengine.outputs.change_aggregate import ChangeAggregate, ChangeAggregateType
# Count winners and losers
winners = ChangeAggregate(
baseline_simulation=baseline,
reform_simulation=reform,
variable="household_net_income",
aggregate_type=ChangeAggregateType.COUNT,
change_geq=1, # Gain at least £1
)
winners.run()
print(f"Winners: {winners.result / 1e6:.1f}m households")
losers = ChangeAggregate(
baseline_simulation=baseline,
reform_simulation=reform,
variable="household_net_income",
aggregate_type=ChangeAggregateType.COUNT,
change_leq=-1, # Lose at least £1
)
losers.run()
print(f"Losers: {losers.result / 1e6:.1f}m households")
# Revenue impact
revenue = ChangeAggregate(
baseline_simulation=baseline,
reform_simulation=reform,
variable="household_tax",
aggregate_type=ChangeAggregateType.SUM,
)
revenue.run()
print(f"Revenue change: £{revenue.result / 1e9:.1f}bn")The package automatically handles entity mapping when variables are defined at different entity levels.
UK:
household
└── benunit (benefit unit)
└── person
US:
household
├── tax_unit
├── spm_unit
├── family
└── marital_unit
└── person
When you request a person-level variable (like ssi) at household level, the package:
- Sums person-level values within each household (aggregation)
- Returns household-level data with proper weights
# SSI is defined at person level, but we want household-level totals
agg = Aggregate(
simulation=simulation,
variable="ssi", # Person-level variable
entity="household", # Target household level
aggregate_type=AggregateType.SUM,
)
# Internally maps person → household by summing SSI for all persons in each householdWhen you request a household-level variable at person level:
- Replicates household values to all persons in that household (expansion)
You can also map data between entities directly using the map_to_entity method:
# Map person income to household level (sum)
household_income = dataset.data.map_to_entity(
source_entity="person",
target_entity="household",
columns=["employment_income"],
how="sum"
)
# Map household rent to person level (project/broadcast)
person_rent = dataset.data.map_to_entity(
source_entity="household",
target_entity="person",
columns=["rent"],
how="project"
)You can map custom value arrays instead of existing columns:
# Map custom per-person values to household level
import numpy as np
# Create custom values (e.g., imputed data)
custom_values = np.array([100, 200, 150, 300])
household_totals = dataset.data.map_to_entity(
source_entity="person",
target_entity="household",
values=custom_values,
how="sum"
)The how parameter controls how values are mapped:
Person → Group (aggregation):
how='sum'(default): Sum values within each grouphow='first': Take first person's value in each group
# Sum person incomes to household level
household_income = data.map_to_entity(
source_entity="person",
target_entity="household",
columns=["employment_income"],
how="sum"
)
# Take first person's age as household reference
household_age = data.map_to_entity(
source_entity="person",
target_entity="household",
columns=["age"],
how="first"
)Group → Person (expansion):
how='project'(default): Broadcast group value to all membershow='divide': Split group value equally among members
# Broadcast household rent to each person
person_rent = data.map_to_entity(
source_entity="household",
target_entity="person",
columns=["rent"],
how="project"
)
# Split household savings equally per person
person_savings = data.map_to_entity(
source_entity="household",
target_entity="person",
columns=["total_savings"],
how="divide"
)Group → Group (via person entity):
how='sum'(default): Sum through person entityhow='first': Take first source group's valuehow='project': Broadcast first source group's valuehow='divide': Split proportionally based on person counts
# UK: Sum benunit benefits to household level
household_benefits = data.map_to_entity(
source_entity="benunit",
target_entity="household",
columns=["universal_credit"],
how="sum"
)
# US: Map tax unit income to household, splitting by members
household_from_tax = data.map_to_entity(
source_entity="tax_unit",
target_entity="household",
columns=["taxable_income"],
how="divide"
)The package includes utilities for creating PolicyEngine-branded visualisations:
from policyengine.utils.plotting import format_fig, COLORS
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(x=[1, 2, 3], y=[4, 5, 6]))
format_fig(
fig,
title="My chart",
xaxis_title="X axis",
yaxis_title="Y axis",
height=600,
width=800,
)
fig.show()COLORS = {
"primary": "#319795", # Teal
"success": "#22C55E", # Green
"warning": "#FEC601", # Yellow
"error": "#EF4444", # Red
"info": "#1890FF", # Blue
"blue_secondary": "#026AA2", # Dark blue
"gray": "#667085", # Gray
}See UK employment income variation for a complete example of:
- Creating custom datasets with varied parameters
- Running single simulations
- Extracting results with filters
- Visualising benefit phase-outs
See UK policy reform analysis for:
- Applying parametric reforms
- Comparing baseline and reform
- Analysing winners/losers by decile
- Calculating revenue impacts
See US income distribution for:
- Loading representative microdata
- Calculating statistics by income decile
- Mapping variables across entity levels
- Creating interactive visualisations
-
Always set would_claim variables: Benefits won't be claimed unless explicitly enabled
"would_claim_uc": [True] * n_households
-
Set disability variables explicitly: Prevents random UC spikes from LCWRA element
"is_disabled_for_benefits": [False] * n_people "uc_limited_capability_for_WRA": [False] * n_people
-
Include required join keys: Person data needs entity membership
"person_household_id": household_ids "person_benunit_id": benunit_ids # UK only
-
Set required household fields: Vary by country
# UK "region": ["LONDON"] * n_households "tenure_type": ["RENT_PRIVATELY"] * n_households # US "state_code": ["CA"] * n_households
- Single simulation for variations: Create all scenarios in one dataset, run once
- Custom variable selection: Only calculate needed variables
- Filter efficiently: Use quantile filters for decile analysis
- Parallel analysis: Multiple Aggregate calls can run independently
- Check weights: Ensure weights sum to expected population
- Validate join keys: All persons should link to valid households
- Review output ranges: Check calculated values are reasonable
- Test edge cases: Zero income, high income, disabled, elderly
- Economic impact analysis: Full baseline-vs-reform comparison workflow
- Advanced outputs: DecileImpact, Poverty, Inequality, IntraDecileImpact
- Regions and scoping: Sub-national analysis (states, constituencies, districts)
- Country-specific documentation:
- Visualisation: Publication-ready charts
- Examples: Complete working scripts