Learning Concept Abstractness

Reproduction of experience of the paper 'Learning Concept Abstracness using weak supervision' from Rabinovich et al. (EMNLP, 2018). Using two suffixes ('-ism', '-ness') known to represent rather abstract words than concretes, a list is extracted from english wikipedia titles. From the wikipedia articles is then extracted 500 sentences for each word in the list.

How to run the scripts from wikipedia dump

Order :

extract_words.py
extract_sentences.py

extract_words

python extract_words path_to_wiki_dump Read the titles from wiki_dump. Remove stop words. Words ending with -ness and -ism are added to data/abstracts and others to data/concrets. Only 1040 most common ones are stored on files.

extract_sentences

python extract_sentences path_to_wiki_dump Read the articles from wiki_dump. Extract for each words of data/abstracts and data/concrets 500 sentences containing them. Write the output as json file in data/concrete_sent.json and data/abstract_sent.json

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md
extract_sentences.py		extract_sentences.py
extract_words.py		extract_words.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Concept Abstractness

How to run the scripts from wikipedia dump

extract_words

extract_sentences

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Learning Concept Abstractness

How to run the scripts from wikipedia dump

extract_words

extract_sentences

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages