-
Notifications
You must be signed in to change notification settings - Fork 22
Updates to Zope's keyphrase extractor (forked from 1.1.0)
turian/topia.termextract
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
This package determines important terms within a given piece of content. It
uses linguistic tools such as Parts-Of-Speech (POS) and some simple
statistical analysis to determine the terms and their strength.
NOTE: This is a fork by Joseph Turian of topia.termextract 1.1.0
CONTRIBUTIONS:
* Unicode alphabetic characters are tokenized correctly.
I changed TERM_SPEC in topic.termextract.tag:
Old = [u'S', u'\xe3o', u'Paulo', u'was', u'home', u'to']
New = [u'S\xe3o', u'Paulo', u'was', u'home', u'to']
* extractor.extract() now has a parameter KEEP_ORIGINAL_SPACING=True,
which allows you to keep the original spacing of the term:
Old = [u'Mr . Smith']
New = [u'Mr. Smith']
* Fixed a bug where a term wouldn't be found if it was literally
the last token of the sentence.
* Fixed a bug (?) where unigram terms were included even if their
tokens were part of a multiterm.
About
Updates to Zope's keyphrase extractor (forked from 1.1.0)
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published