Skip to content

Releases: Lol4t0/python-goose

Better extraction & better http handling

21 Jan 09:07

Choose a tag to compare

  • Requests used for images. Same http session is used for all requests.
  • Analyze all possible text root nodes and select best one, do not stop on first text root node candidate
  • Improve text selection filters

Move to requests as network library

13 Jan 09:45

Choose a tag to compare

Draft new release

1.0.28:

  * Move to requests as network library

Python 3 support

12 Jan 10:41

Choose a tag to compare

Enable python3 support

Fix unicode processing + ` ` support

12 Jan 09:22

Choose a tag to compare

  • As STOP_WORDS are stored in unicode format we should keep our words candidates in unicode also to be able to compare candidates against dictionary correctly
  • With some languages, short stopwords are linked to the next word in the sentance with no-breakable-space. To designate those stop words we should support nbsp when tokenizing. Russian is an example. So this fixes grangier#223