Files:

`cmv_triples_*.json`

OP/PC/Explanation triples, in the form of a list of json dictionaries, each dictionary with the following keys:

op_selftext: the text of the OP.
deltaed_comment: what we call the "persuasive comment (PC)" in the paper.
explanation: the explanation.

The values of the dictionary are all strings.

train runs from January 1 2013 through April 1 2018, valid from April 1 2018 through September 1 2018, and test from September 1 2018 through Jan 31 2019.

`cmv_triples__token.jsonlist.gz`

Tokenized versions of the corresponding json files. When extracted, each of these will contain a newline-separated list of dictionaries, with the same keys as above. However, rather than strings, the values will be lists of tokens, where each token is itself represented as a 6-element list.

For example:

{"op_selftext": [["Even", "even", "ADV", "", "RB", "advmod"], ["if", "if", "ADP", "", "IN", "mark"], ["love", "love", "NOUN", "", "NN", "nsubj"]],
"deltaed_comment": [["From", "from", "ADP", "", "IN", "prep"], ["a", "a", "DET", "", "DT", "det"], ["microbiological", "microbiolog", "ADJ", "", "JJ", "amod"], ["perspective", "perspect", "NOUN", "", "NN", "pobj"]],
"explanation": [["I", "i", "PRON", "", "PRP", "nsubj"], ["'m", "'m", "VERB", "", "VBP", "ROOT"], ["not", "not", "ADV", "", "RB", "neg"], ["certain", "certain", "ADJ", "", "JJ", "acomp"]]}

Each token consists of 6 strings, representing, respectively:

the word
the stemmed word
the spaCy part of speech tag, corresponding to the _pos property.
the named entity type, if present.
the spaCy part of speech tag, corresponding to the _tag propery.
the spaCy dependency label.

See https://spacy.io/api/annotation for descriptions of the POS, dependency, and named entity values.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

First release (with data files)

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Files:

`cmv_triples_*.json`

`cmv_triples__token.jsonlist.gz`

Uh oh!

First release (with data files)

Files:

cmv_triples_*.json

cmv_triples_*_token.jsonlist.gz*

Uh oh!

`cmv_triples_*.json`

`cmv_triples__token.jsonlist.gz`