Releases · rspeer/wordfreq

26 Sep 21:56

rspeer

v3.0.2

372f6db

v3.0.2: packaging fixes Latest

Latest

Updated the range of allowable versions of regex. Versions before 2021.7.6 don't have the regex.Match class.
Added the extras dependencies as optional dependencies in pyproject.toml.

Assets 2

26 Sep 21:56

rspeer

v3.0

0fc7756

v3.0: The "handle numbers better" release

Previously, wordfreq would group all digit sequences of the same 'shape',
with length 2 or more, into a single token and return the frequency of that
token, which would be a vast overestimate.

Now it distributes the frequency over all numbers of that shape, with an
estimated distribution that allows for Benford's law (lower numbers are more
frequent) and a special frequency distribution for 4-digit numbers that look
like years (2010 is more frequent than 1020).

More changes related to digits:

Functions such as iter_wordlist and top_n_list no longer return
multi-digit numbers (they used to return them in their "smashed" form, such
as "0000").
lossy_tokenize no longer replaces digit sequences with 0s. That happens
instead in a place that's internal to the word_frequency function, so we can
look at the values of the digits before they're replaced.

Other changes:

wordfreq is now developed using poetry as its package manager, and with
pyproject.toml as the source of configuration instead of setup.py.
The minimum version of Python supported is 3.7.
Type information is exported using py.typed.

Assets 2

02 Sep 21:55

rspeer

v2.5.1

11a3138

v2.5.1

Version 2.5.1 (2021-09-02)

Import ftfy and use its uncurl_quotes method to turn curly quotes into
straight ones, providing consistency with multiple forms of apostrophes.
Set minimum version requierements on regex, jieba, and langcodes
so that tokenization will give consistent results.
Work around an inconsistency in the msgpack API around
strict_map_key=False.

Version 2.5 (2021-04-15)

Incorporate data from the OSCAR corpus.

Assets 2

03 Oct 21:11

rspeer

v2.2

bc12599

v2.2

Merge pull request #60 from LuminosoInsight/gender-neutral-at

Recognize "@" in gender-neutral word endings as part of the token

Assets 2

27 Sep 17:30

rspeer

v1.7

721a1e9

v1.7

This release of wordfreq gives word frequencies in 32 languages from a variety of data sources, which it checks against each other to mitigate outliers.

See CHANGELOG.md for more details on the version history.

Assets 2

08 Sep 17:56

rspeer

v1.5.1

0ba563c

v1.5.1

This release of wordfreq gives word frequencies in 27 languages from a variety of data sources, which it checks against each other to mitigate outliers.

See CHANGELOG.md for more details on the version history.

Assets 2

Releases: rspeer/wordfreq

v3.0.2: packaging fixes

Uh oh!

v3.0: The "handle numbers better" release

Uh oh!

v2.5.1

Version 2.5.1 (2021-09-02)

Version 2.5 (2021-04-15)

Uh oh!

v2.2

Uh oh!

v1.7

Uh oh!

v1.5.1

Uh oh!