MiSS Information Extraction

The code provided here allows for extraction of names and husband-wife relationships from notarial acts written in Dutch language.

How to use this code

Using this code is very easy. Just download the complete package and run the main file 'nerd_main.py' in python:

python nerd_main.py

more details

nerd_main.py contains the main class NERD(text) that can be used as

nerd = Nerd(a_piece_of_text)

Once an instance nerd is made, the references can be extracted by

nerd.get_references()

and the relations can be extracted by

nerd.get_relations()

Also, a highlighted html text can be exported by using the following code

nerd.get_highlighted_text()

module_preprocess.py contains the code for preprocessing the text and removing/correcting the bad text patterns
module_names.py contains the code for tagging words
module_refs contains the code for using the tagged words to extract relations
module_rels contains the code for detecting the husband-wife relationships
/db-folder contains some dictionaries required to extract the names from text ... first_name.txt: list of frequent first names in Dutch ... last_name_multiple.txt: list of common last names that consist of more than one word ... starting_words.py list of the words that start a sentence and can be problematic in detecting the correct pattern of names

Evaluations

according to the first evaluations on 48 notarial acts that contain 309 individual names, 278 names are extracted precisely and 31 names are undetected: Recall: 90%, Precision: 91%

Terms of Use

This code is developed within the MiSS project (http://swarmlab.unimaas.nl/catch/), funded by NWO. This code is free to use. However, it will be highly appreciated if the developer gets notified in case of use (email: bij.ranjbar@gmail.com).

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
db		db
README.md		README.md
import_db.py		import_db.py
module_names.py		module_names.py
module_preprocess.py		module_preprocess.py
module_refs.py		module_refs.py
module_rels.py		module_rels.py
nerd_main.py		nerd_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MiSS Information Extraction

How to use this code

more details

Evaluations

Terms of Use

About

Uh oh!

Releases

Packages

Languages

branjbar/MiSS-Information-Extraction

Folders and files

Latest commit

History

Repository files navigation

MiSS Information Extraction

How to use this code

more details

Evaluations

Terms of Use

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages