juditha

A super-fast lookup service for canonical names based on tantivy.

juditha wants to solve the noise/garbage problem occurring when working with Named Entity Recognition. Given the availability of huge lists of known names, such as company registries or lists of persons of interest, one could canonize ner-results against this service to check if they are known.

The implementation uses a pre-populated tantivy index. Data is either FollowTheMoney entities or simply list of names.

quickstart

pip install juditha

populate

echo "Jane Doe\nAlice" | juditha load-names

lookup

juditha lookup "jane doe"
"Jane Doe"

To match more fuzzy, reduce the threshold (default 0.97):

juditha lookup "doe, jane" --threshold 0.5
"Jane Doe"

data import

from ftm entities

cat entities.ftm.json | juditha load-entities
juditha build

from anywhere

juditha load-names -i s3://my_bucket/names.txt
juditha load-entities -i https://data.ftm.store/eu_authorities/entities.ftm.json
juditha build

a complete dataset or catalog

Following the nomenklatura specification, a dataset json config needs names.txt or entities.ftm.json in its resources.

juditha load-dataset https://data.ftm.store/eu_authorities/index.json
juditha load-catalog https://data.ftm.store/investigraph/catalog.json
juditha build

use in python applications

from juditha import lookup

assert lookup("jane doe") == "Jane Doe"
assert lookup("doe, jane") is None
assert lookup("doe, jane", threshold=0.5) == "Jane Doe"

the name

Juditha Dommer was the daughter of a coppersmith and raised seven children, while her husband Johann Pachelbel wrote a canon.

Versioning

To mark the compatibility with followthemoney, juditha follows the same major version, which is currently 4.x.x.

License and Copyright

juditha is licensed under the AGPLv3 or later license.

see NOTICE and LICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.github		.github
.vscode		.vscode
juditha		juditha
tests		tests
.bumpversion.cfg		.bumpversion.cfg
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
VERSION		VERSION
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

juditha

quickstart

populate

lookup

data import

from ftm entities

from anywhere

a complete dataset or catalog

use in python applications

the name

Versioning

License and Copyright

About

Uh oh!

Releases 8

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

dataresearchcenter/juditha

Folders and files

Latest commit

History

Repository files navigation

juditha

quickstart

populate

lookup

data import

from ftm entities

from anywhere

a complete dataset or catalog

use in python applications

the name

Versioning

License and Copyright

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages