CH3-01-Creating Embedded Chunks Notebook requires NLTK package to run. It currently gives an error


The CH3-01-Creating Embedded Chunks Notebook has a extract_doc_text function that currently gives an error as follows:

**Error Message**
```
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-6c3e5f5d-bd8b-4fa7-a80a-a7736e55c8df/lib/python3.10/site-packages/nltk/data.py:579, in find(resource_name, paths)
    577 sep = "*" * 70
    578 resource_not_found = f"\n{sep}\n{msg}\n{sep}\n"
--> 579 raise LookupError(resource_not_found)
```

**The following is the cell that is giving the error.**
```
with open(f"{documents_folder}2303.10130.pdf", mode="rb") as pdf:
  doc = extract_doc_text(pdf.read())  
  print(doc)
```

**Below is the fix required:**
```
%pip install nltk 

#Or install nltk as the library in the cluster itself.

import nltk

# Download the required NLTK resources
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger_eng')

with open(f"{documents_folder}2303.10130.pdf", mode="rb") as pdf:
  doc = extract_doc_text(pdf.read())  
  print(doc)`
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CH3-01-Creating Embedded Chunks Notebook requires NLTK package to run. It currently gives an error #91

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CH3-01-Creating Embedded Chunks Notebook requires NLTK package to run. It currently gives an error #91

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions