-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Currently the sources are passed around as a collection of json.dumps strings associated with the annotation dataclasses. This is bloated and also creates some issues with duplicates when we convert the internal annotation objects back to dictionaries for dumping to Label Studio input, since annotation dataclasses correspond to a class of annotation type we want to score individually (DocTimeRel, Adverse Event, etc.), but since we're keeping track of sources by ID and offset this means sources might be shared.
To me there seem to be three better possibilities:
- Mapping from ID and offsets to their collection of internal annotations (problem, giant omniscient data structure)
- Have source be exactly one entity (still some bloat)
- (Favorite obviously) develop better models of Label Studio annotation types and have conversion method from internal dataclasses to those (can use this as a pivot to bring in Pydantic and fold into Ian's code if that's ever helpful)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels