-
Notifications
You must be signed in to change notification settings - Fork 32
Description
Bug
Microsimulation() with no dataset argument fails to download the latest data because the hardcoded HuggingFace URL in simulation.py:147 points to a repo that doesn't exist.
Current behavior
In policyengine_uk/simulation.py line 147, the default dataset URL is:
"hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"This gets parsed by policyengine_core.tools.hugging_face.download_huggingface_dataset as:
- repo:
policyengine/policyengine-uk-data - filename:
enhanced_frs_2023_24.h5
However, the repo policyengine/policyengine-uk-data does not exist on HuggingFace (returns 404). The function catches the RepositoryNotFoundError, assumes the repo is private, prompts for an HF token, and then tries to download — which either fails or serves a stale cached file.
Where the data actually lives
The enhanced_frs_2023_24.h5 file is uploaded by the policyengine-uk-data CI pipeline to:
policyengine/policyengine-uk-data-private (repo_type: model)
This can be confirmed in policyengine_uk_data/storage/upload_completed_datasets.py:
upload_data_files(
files=dataset_files,
hf_repo_name="policyengine/policyengine-uk-data-private",
hf_repo_type="model",
...
)How we found this
While reviewing PR PolicyEngine/uk-land-value-tax#1, we needed to verify simulation results with the latest data. We found:
policyengine-uk(2.75.2) does not listpolicyengine-uk-dataas a pip dependency, so upgrading PE UK never pulls the latest data package- The HF URL is the only mechanism for fetching the dataset at runtime, but it points to the wrong repo
- The correct repo (
policyengine/policyengine-uk-data-private) hasenhanced_frs_2023_24.h5uploaded by the 1.45.0 CI run
Proposed fix
In policyengine_uk/simulation.py line 147, change:
"hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"to:
"hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5"Additional note
It may also be worth considering adding policyengine-uk-data as an optional dependency so that the installed package's storage folder can be used as a fallback, rather than relying solely on runtime HF downloads.