-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathtree_parser.py
More file actions
40 lines (31 loc) · 1.33 KB
/
tree_parser.py
File metadata and controls
40 lines (31 loc) · 1.33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
"""
Parser for Penn WSJ trees.
It is basically a dumbed-down version of the S-expression parser provided as
an example of the pyparsing module. This example is available at:
http://pyparsing.wikispaces.com/file/view/sexpParser.py
"""
import pprint
import pyparsing
# Define punctuation literals
LPAR, RPAR, LBRK, RBRK, LBRC, RBRC = map(pyparsing.Suppress, "()[]{}")
# The symbols a token can contain
token = pyparsing.Word(pyparsing.alphanums + "-./_:;*+=!<>@&`',?%#$\\")
display = LBRK + token + RBRK
string_ = pyparsing.Optional(display) + token
sexp = pyparsing.Forward()
sexpList = pyparsing.Group(LPAR + pyparsing.ZeroOrMore(sexp) + RPAR)
sexp << (string_ | sexpList)
def get_tree(text_tree):
"""Returns a nested list of strings representing a textual parse tree.
The tokens are not interpreted. For example, numerical-valued leaf nodes
are still represented as strings in the tree.
Args:
text_tree: a string representation of a single parse tree. These are
defined by the Penn WSJ database. A sample input is:
"(TOP (INTJ (UH damn) (. !)) )"
Returns:
The parse tree of the input, represented in a list datastructure. For
example:
['TOP', ['INTJ', ['UH, 'damn'], ['.', '!']]]
"""
return sexp.parseString(text_tree, parseAll=True).asList()[0]