-
Notifications
You must be signed in to change notification settings - Fork 8
Alignment JSON Format
Austin Blodgett edited this page Feb 10, 2021
·
3 revisions
I find it useful to use JSON to represent AMR alignments. The package includes tools for converting AMR alignments from and to JSON like the following.
{'amr1':
[{'type':'isi', 'tokens':[0], 'nodes':['1.1'], 'edges':[]},
{'type':'isi', 'tokens':[1], 'nodes':['1'], 'edges':[['1',':ARG0','1.1'],['1',':ARG1','1.2']]},
{'type':'isi', 'tokens':[2], 'nodes':['1.2'], 'edges':[]},
...
],
'amr2':
[{'type':'isi', 'tokens':[0], 'nodes':['1'], 'edges':[]},
{'type':'isi', 'tokens':[1], 'nodes':[], 'edges':[['1',':ARG0','1.1']]},
{'type':'isi', 'tokens':[2], 'nodes':['1.1'], 'edges':[]},
...
],
The JSON is a dictionary mapping AMR ids to a list of alignments. Each alignment is a dictionary with attributes 'type', 'tokens', 'nodes', and 'edges'. The 'type' attribute allows us to distinguish different categories of alignments or other information. Since the unique ids stored in an alignment have to match ids in AMR to be interpreted, we default to using LDC/ISI style node ids.
The advantages of using JSON are:
- Easy to load and save (No need to write a special script for reading some esoteric format)
- Can store additional information in a
typeto distinguish different types of alignments - Can easily store multiple sets of alignments separately, without needing to modify an AMR file. That makes it easy to compare different sets of alignments or aligning different information in different layers of alignment.
To read alignments from a JSON file do:
reader = AMR_Reader()
alignments = reader.load_alignments_from_json(alignments_file)
To save alignments to a JSON file do:
reader = AMR_Reader()
reader.save_alignments_to_json(alignments_file, alignments)