-
Notifications
You must be signed in to change notification settings - Fork 5
Alignment Data
We store alignment data in a JSON format, where each alignment has attributes type, tokens, nodes, edges, and string (optional readable string). Token indices are based on the ISI tokenization of AMR release 3.0 and is available in the alignments folder of the LDC data release. Node ids are based on ISI style ids, where 1.n is the nth child of 1 with indices starting at 1.
Examples of alignments below are based on the following AMR.
# ::tok Most of the students want to visit New York when they graduate .
# ::node 1 want-01
# ::node 1.1 person
# ::node 1.1.1 study-01
# ::node 1.1.2 include-91
# ::node 1.1.2.1 person
# ::node 1.1.2.1.1 study-01
# ::node 1.1.2.2 most
# ::node 1.2 visit-01
# ::node 1.2.2 city
# ::node 1.2.2.1 name
# ::node 1.2.3 graduate-01
# ::node 1.2.2.1.1 "New"
# ::node 1.2.2.1.2 "York"
# ::root 1 want-01
# ::edge name op1 "New" 1.2.2.1 1.2.2.1.1
# ::edge name op2 "York" 1.2.2.1 1.2.2.1.2
# ::edge want-01 ARG0 person 1 1.1
# ::edge person ARG0-of study-01 1.1 1.1.1
# ::edge person ARG1-of include-91 1.1 1.1.2
# ::edge include-91 ARG2 person 1.1.2 1.1.2.1
# ::edge person ARG0-of study-01 1.1.2.1 1.1.2.1.1
# ::edge include-91 ARG3 most 1.1.2 1.1.2.2
# ::edge want-01 ARG1 visit-01 1 1.2
# ::edge visit-01 ARG0 person 1.2 1.1
# ::edge visit-01 ARG1 city 1.2 1.2.2
# ::edge city name name 1.2.2 1.2.2.1
# ::edge visit-01 time graduate-01 1.2 1.2.3
# ::edge graduate-01 ARG0 person 1.2.3 1.1
(w/want-01
:ARG0 (p/person
:ARG0-of (s/study-01)
:ARG1-of (i/include-91
:ARG2 (p2/person
:ARG0-of (s2/study-01))
:ARG3 (m/most)))
:ARG1 (v/visit-01
:ARG0 p
:ARG1 (c/city
:name (n/name
:op1 "New"
:op2 "York"))
:time (g/graduate-01
:ARG0 p)))
Subgraph alignments map a connected (DAG-shaped) subgraph to a single span. Using spans instead of tokens allows for better handling of named entities and multiword expressions.
Alignment of "students" to (person :ARG0-of study-01):
{"type": "subgraph",
"tokens": [3],
"nodes": ["1.1", "1.1.1"],
"edges": [["1.1", ":ARG0-of", "1.1.1"]],
"string": "students => person :ARG0-of study-01"}
Alignment of "New York":
{"type": "subgraph",
"tokens": [7, 8],
"nodes": ["1.2.2", "1.2.2.1", "1.2.2.1.1", "1.2.2.1.2"],
"edges": [["1.2.2.1", ":name", "1.2.2.1.1"], ["1.2.2.1", ":op1", "1.2.2.1.2"], ["1.2.2", ":op2", "1.2.2.1"]],
"string": "New York => city :name name, name :op1 New, name :op2 York"}
Some AMRs involve a "duplicate" of some part of the graph to represent ellipsis and other phenomena where some part of the meaning is unpronounced. For example, the phrase "most of the students" evokes two groups of students both of which are referenced in AMR.
Duplicate subgraph for "students" in "most of the students":
{"type": "dupl-subgraph",
"tokens": [3],
"nodes": ["1.1.2.1","1.1.2.1.1"]},
"edges": [["1.1.2.1",":ARG0-of","1.1.2.1.1"]],
"string": "students => person :ARG0-of study-01"}
Relation alignments map a span to a collection of edges such as "want" to its argument structure (:ARG0, :ARG1) or the preposition "when" to the ":time" relation.
Alignment of "want" to its argument structure:
{"type": "relation",
"tokens": [4],
"nodes": [],
"edges": [["1", ":ARG0", "1.1"], ["1", ":ARG1", "1.2"]],
"string": "want => want-01 :ARG0 person, want-01 :ARG1 visit-01"}
Alignment of "when" to :time :
{"type": "relation",
"tokens": [9],
"nodes": [],
"edges": [["1.2", ":time", "1.2.3"]],
"string": "when => visit-01 :time graduate-01"}
Reentrancy alignments map reentrant edges to the spans that trigger them and are classified with labels based on type. examples coref, coordination, control
Alignments for coreference ("them") and control ("want"):
{"type": "reentrancy:coref",
"tokens": [10],
"nodes": [],
"edges": [["1.2.3", ":ARG0", "1.1"]],
"string": "they => graduate-01 :ARG0 person"}
{"type": "reentrancy:control",
"tokens": [4],
"nodes": [],
"edges": [["1.2", ":ARG0", "1.1"]],
"string": "want => visit-01 :ARG0 person"}