When using the EPC datasets we need to be careful with duplicate EPCs for the same property. While not an enormous issue as an EPC is valid for up to 10 years unless the property is renovated or retrofitted, there may be multiple records especially for rental properties which are improved to meet recent regulations.
We should be able to spot this by removing duplicates with the same UPRN (UPRN: Unique Property Reference Number) and I would suggest selecting the most recent record and discarding others. I will add this feature to the R code for the energy intensity sampler.
I'm not sure this will have a big impact when taking a recent sample of 5000 certificates from the API, but when using the full csv this could be a problem (My colleague has pointed out some properties in that dataset can have four or five duplicates!).
When using the EPC datasets we need to be careful with duplicate EPCs for the same property. While not an enormous issue as an EPC is valid for up to 10 years unless the property is renovated or retrofitted, there may be multiple records especially for rental properties which are improved to meet recent regulations.
We should be able to spot this by removing duplicates with the same UPRN (UPRN: Unique Property Reference Number) and I would suggest selecting the most recent record and discarding others. I will add this feature to the R code for the energy intensity sampler.
I'm not sure this will have a big impact when taking a recent sample of 5000 certificates from the API, but when using the full csv this could be a problem (My colleague has pointed out some properties in that dataset can have four or five duplicates!).