TreeSQuAD2.0 Dataset

Welcome to the TreeSQuAD2.0 dataset, a public resource created by the Dalhousie Natural Language Processing Lab (DNLP). This dataset is the result of my master's thesis research, focusing on incorporating Structural Embedding of Constituency Trees in the Attention-Based Model for Machine Comprehension. The thesis can be accessed here.

Contents in the 'Processed' Folder

The 'Processed' folder contains the following:

Parsed Trees:
- Parsed trees generated using the Stanford CoreNLP parser. (Citation: Manning et al., 2014)
Simplified Trees:
- Trees simplified into Nodes, Leaves, spans, and POSTags using the fairseq library. (Citation: Ott et al., 2019)
Vocabulary:
- Vocabulary of Tokens.

Usage

Feel free to explore and utilize the dataset for your NLP and machine comprehension projects. If you find this resource helpful, consider citing this work or providing feedback.

Acknowledgments

I am sincerely grateful to Dr. Vlado Keselj for his invaluable guidance and support throughout this research.

Happy coding!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
datasets		datasets
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TreeSQuAD2.0 Dataset

Contents in the 'Processed' Folder

Usage

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

mayankanand111/TreeSQuAD

Folders and files

Latest commit

History

Repository files navigation

TreeSQuAD2.0 Dataset

Contents in the 'Processed' Folder

Usage

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages