PubAnnotation

PubAnnotation is a repository for publically hosting annotations over biomedical documents. The platform accepts submissions in a JSON format that contains both text and annotations.

bconv provides two loaders and two formatters: pubanno_json reads/produces an individual JSON file for a single document. pubanno_json.tgz reads/creates a compressed archive containing multiple JSON files for a document or collection.

Example

{
  "sourceid": "354896",
  "sourcedb": "PubMed",
  "text": "Lidocaine-induced cardiac asystole.\n",
  "denotations": [
    {
      "id": "T2",
      "span": {
        "begin": 18,
        "end": 34
      },
      "obj": "Disease"
    }
  ]
}

→ Full example

Sources

The PubAnnotation website has a description of the basic annotation format and the instructions for representing more complex documents.

Notes

Document structure: The pubanno_json format is designed primarily for abstracts. Full-text documents can be represented, but any document-internal structure is lost. bconv also allows exporting collections with pubanno_json, but the resulting JSON (an array of document objects) is not accepted by PubAnnotation, and cannot even be loaded directly by bconv. The pubanno_json.tgz format, however, supports multi-document collections and also preserves section boundaries.
Metadata: The format records the document ID with the key sourceid. The sourcedb option allows specifying the resource that defines the document ID (eg. PubMed). In the pubanno_json.tgz format, sections are preserved and enumerated, but their type is not stored.
Entity annotations: Annotations have an ID (counter), start/end offsets, and an obj attribute, which is typically the entity type. Additional attributes (eg. the concept ID) are stored in attribute annotations.
Whitespace: Whitespace is preserved.
Offsets: Entity offsets are calculated as Unicode codepoint units.
Discontinuous spans: Discontinuous spans are represented with PubAnnotation's bagging model.
Relations/events: PubAnnotation only supports binary relations. When serialising, bconv interprets the first and second relation member as the subj and obj attribute, respectively, and the type entry of the relation metadata as the predicate (pred). Other metadata as well as the role value of relation members are ignored. In PubAnnotation, complex relations with more than two members can be represented through nesting; however, bconv does not attempt an automatic conversion and simply raises an exception if the arity is different from 2.

Loaders

`PubAnnoJSONLoader`

Properties

fmt	`pubanno_json`
native type	Document
lazy loading	no
supports text	yes
supports annotations	yes
stream type	text

Options

name	type	default	purpose
obj	str	`'type'`	key in `Entity.metadata` for the `obj` field

`PubAnnoTGZLoader`

Properties

fmt	`pubanno_json.tgz`
native type	Collection
lazy loading	no
supports text	yes
supports annotations	yes
stream type	binary

name	type	default	purpose
obj	str	`'type'`	key in `Entity.metadata` for the `obj` field

Exporters

`PubAnnoJSONFormatter`

Properties

fmt	`pubanno_json`
supports text	yes
supports annotations	yes
stream type	text

Options

name	type	default	purpose
obj	str	`'type'`	key in `Entity.metadata` for the `obj` field
sourcedb	str	`None`	source of the article text
avoid_gaps	str	`None`	suppress discontinuous spans
avoid_overlaps	str	`None`	suppress annotation collisions
**meta	Dict[str, Any]	`{}`	additional key-value pairs directly copied into the output JSON

`PubAnnoTGZFormatter`

Properties

fmt	`pubanno_json.tgz`
supports text	yes
supports annotations	yes
stream type	binary

name	type	default	purpose
obj	str	`'type'`	key in `Entity.metadata` for the `obj` field
sourcedb	str	`None`	source of the article text
avoid_gaps	str	`None`	suppress discontinuous spans
avoid_overlaps	str	`None`	suppress annotation collisions
**meta	Dict[str, Any]	`{}`	additional key-value pairs directly copied into the output JSON

bconv Documentation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PubAnnotation

PubAnnotation

Example

Sources

Notes

Loaders

`PubAnnoJSONLoader`

Properties

Options

`PubAnnoTGZLoader`

Properties

Exporters

`PubAnnoJSONFormatter`

Properties

Options

`PubAnnoTGZFormatter`

Properties

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally