Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
.vscode/*
*.ipynb_checkpoints
*.pytest_cache*
01-ActivityModel/data/*
*all-domestic-certificates.zip
*epc_england.zip
Expand Down
168 changes: 168 additions & 0 deletions 01-ActivityModel/activity-model/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
# Activity Model

This is the code repository for the Activity Model package.

The Activity Model returns a synthetic population to represent the England
household population. This model, however, does not create the population from
scratch, it uses a well-known synthetic population, the SPENSER, and adds the
following information for each household:

- Accommodation floor area (band)
- Accommodation age (band)
- Gas (flag: Y/N)

The accommodation information is originally obtained from Domestic Energy
Performance Certificates (EPC) and then, codified in this package.

To enrich the SPENSER population with the EPC data, the Propensity Score
Matching (PSM) method is applied.

The main output is an enriched synthetic population that we use as input for
energy estimation models developed by the Energy Flexibility Project.

## Environment setup

This package currently supports running on Linux. <!-- and macOS. -->

To start working with this repository you need to clone it onto your local
machine:

```bash
$ git clone https://github.com/anetobradley/energy_flex.git
$ cd energy_flex/01-ActivityModel/activity-model/
```

This package requires a specific
[conda](https://docs.anaconda.com/anaconda/install/) environment.
You can create an environment for this project using the provided
environment file:

```bash
$ conda env create -f environment.yml
$ conda activate energyflex
```

## Configuring the model

### Required

#### EPC Credentials

To retrieve data to run the model you will need to have EPC-API credentials.
You can register [here](https://epc.opendatacommunities.org/#register).
Next you need to add your credentials into the
[epc_api](./config/epc_api.yaml) file (you can use your favourite text
editor for this):

```bash
$ nano config/user.yaml
# EPC credentials
epc_user: "user@email"
epc_key: "user_key"
```

#### Local Authority codes

You need provide the code for all Local Authorities that you want a synthetic
population. Please, insert the values [here](./config/lad_codes.yaml).
If you not provide any additional value, the default is return the population
just for Haringey.

You can find
[here](https://epc.opendatacommunities.org/docs/api/domestic#domestic-local-authority)
all LAD codes available in the EPC data.

### Optional

#### Year

You can define [here](./config/epc_api.yaml) a different range of the EPC
lodgement date (the default is 2008-2022).

#### EPC variables

If you want to enrich the synthetic population with more EPC variables you
need to add them in two lists:

- [epc_api config file](./config/epc_api.yaml) under `epc_headers`.
- [psm config file](./config/psm.yaml) under `matches_columns`.

You can find a complete EPC Glossary
[here](https://epc.opendatacommunities.org/docs/guidance#glossary),
but be aware that there is a difference between the spellings of the terms
described in this list and how they are used in the API. In our experience the
differences are:

- capital letters must be written in lowercase letters.
- underscore must be replaced by a hyphen.

We also warn that most of the information is unencoded, which can make it
difficult to use (as well as making the output file unnecessarily large).
The default variables (accommodation floor area, accommodation age, gas)
are properly encoded and organized by this package.

#### Data url

Three dataset are obtained through urls:

- EPC data
- SPENSER data
- Area lookup data

If you want to use different urls, you can change then in:

- EPC url [here](./config/epc_api.yaml) under `epc_url`
- SPENSER url [here](./config/spenser.yaml) under `spenser_url`
- Area lookup url [here](./config/lookups.yaml) under `area_url`

Note: You ca obtain data from other places, after all new
versions are expected, but it is necessary to ensure that the data structure
is similar or the code will not work.

#### Area granularity

The default granularity is Output Areas, but you can use others, like:

- Lower Layer Super Output Areas (`lsoa11cd`)
- Middle Layer Super Output Areas (`msoa11cd`)
- Local authority districts (`ladcd`)

To change this, please use the `area_in_out` variable
[here](./config/lookups.yaml).

Note that if you change the Area lookup url, the granularities code may also
change!

## Installation & Usage

Next we install the Activity Model package into the environment using `setup.py`:

```bash
# for using the code base use
$ python setup.py install
```

## Running the model

If you installed the package with the `setup.py` file, to run the model:

```bash
$ python activity_model
```

If you did not install the package with the `setup.py` file, you can run the
code through

```bash
# for using the code base use
$ python activity_model/__main__.py
```

## Outputs

The outputs are stored at `data/output/`. Three outputs are expected:

1. Propensity score distribution images for each local authority.
2. Internal validation images for each local authority.
3. Enriched synthetic population for each local authority (CSV file).
All CSV files are compressed into a zip file.
1 change: 1 addition & 0 deletions 01-ActivityModel/activity-model/activity_model/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
__version__ = '0.1.0'
28 changes: 28 additions & 0 deletions 01-ActivityModel/activity-model/activity_model/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import yaml

from data_preparation import Epc, Spenser
from enriching_population import EnrichingPopulation

if __name__ == "__main__":
print("hi")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have add some suggestion in the #9

Copy link
Copy Markdown
Collaborator

@mfbenitezp mfbenitezp Apr 19, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@patricia-ternes @nickmalleson I have tested successfully the lastest code in this branch. I was able to run it for Leeds, and I have just a few comments that I have included in #9. Here are again my comments:

  1. I think the readme can also include some instructions on how o run the model., more like guide the user to going from one readme to another one. Or maybe the model can be wrapped in another folder structure.
  2. In the readme of /config, is there any way to run it to the whole local authorities?
  3. Once the required parameters are set, the rest of the text can be part of some section like extended or expand the model to the users needs.
  4. I replaced the hi print for another message.

spenser = Spenser()
epc = Epc()
psm = EnrichingPopulation()

list_df = []
list_df_names = []
lad_codes_yaml = open("config/lad_codes.yaml")
parsed_lad_codes = yaml.load(lad_codes_yaml, Loader=yaml.FullLoader)
lad_codes = parsed_lad_codes.get("lad_codes")

for lad_code in lad_codes:
spenser_df = spenser.step(lad_code)
epc_df = epc.step(lad_code)
rich_df = psm.step(
spenser_df, epc_df, lad_code, psm_fig=True, validation_fig=True
)
list_df_names.append("_".join([lad_code, "hh_msm_epc.csv"]))
list_df.append(rich_df)

psm.save_enriched_pop(list_df_names, list_df)

Loading