Skip to content

Commit 62ae651

Browse files
authored
Merge pull request #87 from pinecone-io/amnon/upgrade-nltk
Upgrade NLTK version to ^3.91 + stop support python 3.8
2 parents 095e62c + 56c5cd0 commit 62ae651

4 files changed

Lines changed: 7 additions & 7 deletions

File tree

.github/workflows/CI.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ jobs:
1111
strategy:
1212
matrix:
1313
os: [macos-latest, windows-latest, ubuntu-latest]
14-
python-version: [3.8, 3.9, '3.10', 3.11, 3.12]
14+
python-version: [3.9, '3.10', 3.11, 3.12]
1515
defaults:
1616
run:
1717
shell: bash

pinecone_text/sparse/bm25_tokenizer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,9 @@ def __init__(
3434
@staticmethod
3535
def nltk_setup() -> None:
3636
try:
37-
nltk.data.find("tokenizers/punkt")
37+
nltk.data.find("tokenizers/punkt_tab")
3838
except LookupError:
39-
nltk.download("punkt")
39+
nltk.download("punkt_tab")
4040

4141
try:
4242
nltk.data.find("corpora/stopwords")

pyproject.toml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,19 @@
11
[tool.poetry]
22
name = "pinecone-text"
3-
version = "0.9.0"
3+
version = "0.10.0"
44
description = "Text utilities library by Pinecone.io"
55
authors = ["Pinecone.io"]
66
readme = "README.md"
77
packages = [{include = "pinecone_text"}]
88

99
[tool.poetry.dependencies]
10-
python = ">=3.8,<4.0"
10+
python = ">=3.9,<4.0"
1111
torch = { version = ">=1.13.1", optional = true }
1212
transformers = { version = ">=4.26.1", optional = true }
1313
sentence-transformers = { version = ">=2.0.0", optional = true }
1414
wget = "^3.2"
1515
mmh3 = "^4.1.0"
16-
nltk = "^3.6.5"
16+
nltk = "^3.9.1"
1717
openai = { version = "^1.2.3", optional = true }
1818
cohere = { version = "^4.37", optional = true }
1919
numpy = [

tests/unit/test_bm25_tokenizer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ def test_nltk_download(self):
152152
language="english",
153153
)
154154

155-
nltk.find("tokenizers/punkt")
155+
nltk.find("tokenizers/punkt_tab")
156156
nltk.find("corpora/stopwords")
157157

158158
assert tokenizer("The quick brown fox jumps over the lazy dog") == [

0 commit comments

Comments
 (0)