Skip to content

Releases: TimSchopf/KeyphraseVectorizers

v0.0.13

02 May 15:58

Choose a tag to compare

Fix small document split and stop words bugs

v0.0.12

29 Apr 13:00

Choose a tag to compare

This release fixes stop word removal bugs, fixes memory issues for long documents, adds a build_tokenizer attribute, and online update functions.

Solves issues #34, #31, #29, #28, #26, and #6.

Add spacy.Language as valid argument for 'spacy_pipeline'

23 Dec 10:53
a20de03

Choose a tag to compare

This release allows to reuse an object from spacy.load for many different KeyphraseVectorizer objects. This release includes PR #19

Custom POS-tagger feature added

19 Jun 14:01

Choose a tag to compare

Added the options to use a custom POS-tagger, define custom stop words, and exclude certain spaCy pipeline components. This release solves issues #2 and #7.

Higher compatibility with available SpaCy pipelines

18 Jun 19:23

Choose a tag to compare

Fixed issue #11 and #10 by removing the default exclusion of certain spaCy pipeline components. This slightly slows down the keyphrase extraction process. However it grants higher compatibility to all available spaCy pipelines, including the ones that use transformers.

Added 'stop_words'=None option

16 May 15:14

Choose a tag to compare

Add stopwords download automation

14 Feb 16:11

Choose a tag to compare

v0.0.7

Signed-off-by: Tim Schopf <tim.schopf@t-online.de>

Change "multiprocessing" parameter to "workers" parameter

12 Feb 14:47

Choose a tag to compare

change "multiprocessing" parameter to "workers" parameter

Signed-off-by: Tim Schopf <tim.schopf@t-online.de>

Added min_df and max_df parameters, added support for documents that have more than 1000000 characters, and limit max keyphrase length to 8 words to prevent memory issues

06 Feb 10:03

Choose a tag to compare

Increased efficiency of spaCy pipeline for POS tagging

03 Feb 16:25

Choose a tag to compare

v0.0.4

v0.0.4, increased efficiency of spaCy pipeline for POS tagging + adde…