Skip to content

Default dataset URL points to non-existent HuggingFace repo #1533

@vahid-ahmadi

Description

@vahid-ahmadi

Bug

Microsimulation() with no dataset argument fails to download the latest data because the hardcoded HuggingFace URL in simulation.py:147 points to a repo that doesn't exist.

Current behavior

In policyengine_uk/simulation.py line 147, the default dataset URL is:

"hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"

This gets parsed by policyengine_core.tools.hugging_face.download_huggingface_dataset as:

  • repo: policyengine/policyengine-uk-data
  • filename: enhanced_frs_2023_24.h5

However, the repo policyengine/policyengine-uk-data does not exist on HuggingFace (returns 404). The function catches the RepositoryNotFoundError, assumes the repo is private, prompts for an HF token, and then tries to download — which either fails or serves a stale cached file.

Where the data actually lives

The enhanced_frs_2023_24.h5 file is uploaded by the policyengine-uk-data CI pipeline to:

policyengine/policyengine-uk-data-private (repo_type: model)

This can be confirmed in policyengine_uk_data/storage/upload_completed_datasets.py:

upload_data_files(
    files=dataset_files,
    hf_repo_name="policyengine/policyengine-uk-data-private",
    hf_repo_type="model",
    ...
)

How we found this

While reviewing PR PolicyEngine/uk-land-value-tax#1, we needed to verify simulation results with the latest data. We found:

  1. policyengine-uk (2.75.2) does not list policyengine-uk-data as a pip dependency, so upgrading PE UK never pulls the latest data package
  2. The HF URL is the only mechanism for fetching the dataset at runtime, but it points to the wrong repo
  3. The correct repo (policyengine/policyengine-uk-data-private) has enhanced_frs_2023_24.h5 uploaded by the 1.45.0 CI run

Proposed fix

In policyengine_uk/simulation.py line 147, change:

"hf://policyengine/policyengine-uk-data/enhanced_frs_2023_24.h5"

to:

"hf://policyengine/policyengine-uk-data-private/enhanced_frs_2023_24.h5"

Additional note

It may also be worth considering adding policyengine-uk-data as an optional dependency so that the installed package's storage folder can be used as a fallback, rather than relying solely on runtime HF downloads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions