Fix Entity Counts

**Overview**
The dataset resource log is created for each file that we process. when it is loaded into the dataset package counts are also created in the table. these counts appear to be incorrect for some of our resources. We need to fix this.

Pull Request(PR): <insert link to pull request when opened>

**Tech Approach**

Investigate:

- The code [here](https://github.com/digital-land/digital-land-python/blob/a64991446211290b015dc06e505bc1d2c52ed9c0/digital_land/package/dataset.py#L185) is where the counts are currently made 
- @sianteesdale has provided a set of examples where the counts don't match and a notebook which contains the code to compare the count in dataset_resource against the count from the actual file
- you can run some of the collections locally (maybe pick smaller ones) to investigate why it's happening. It could be due to another issue around facts not being present which doesn't appear to happen locally.

Solve:
- moving forward we will likely reduce the emphasis on the dataset package. This is important as it means it may be best to migrate the counting of entities to the pipeline itself. This will also mean the computation happens at a different time.
- this calculation can happen [here](https://github.com/digital-land/digital-land-python/blob/a64991446211290b015dc06e505bc1d2c52ed9c0/digital_land/pipeline/main.py#L507) inside the transform method as it's where the log is created
- a note on the above - check performance implications if there are some we may want a boolean argument to turn it on and off for now as this code does run in the async processor so might slow it down.
- remember to remove coed calculating it in the sqlite files

**Acceptance Criteria/Tests**
- entity counts in dataset_resource must match what is in the transformed_resource

**Resourcing & Dependencies**

- will require changes to digital-land-python and maybe colelction-task and makerules


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Entity Counts #2421

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Fix Entity Counts #2421

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions