-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
I tried clara on a couple projects and run into errors in the document loading process.
Loading node_modules/ipaddr.js …
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.10/site-packages/langchain/document_loaders/text.py", line 40, in load
with open(self.file_path, encoding=self.encoding) as f:
IsADirectoryError: [Errno 21] Is a directory: 'node_modules/ipaddr.js'
File "/opt/homebrew/lib/python3.10/site-packages/clara/cli.py", line 30, in setup
index.ingest()
File "/opt/homebrew/lib/python3.10/site-packages/clara/index.py", line 73, in ingest
texts = self._get_texts()
File "/opt/homebrew/lib/python3.10/site-packages/clara/index.py", line 53, in _get_texts
documents.extend(loader.load_and_split())
File "/opt/homebrew/lib/python3.10/site-packages/langchain/document_loaders/base.py", line 43, in load_and_split
docs = self.load()
File "/opt/homebrew/lib/python3.10/site-packages/langchain/document_loaders/text.py", line 56, in load
raise RuntimeError(f"Error loading {self.file_path}") from e
RuntimeError: Error loading /Users/tmm1/fancybits/chrome-capture-for-channels/node_modules/ipaddr.js
Loading ext/libhdhomerun/README.md … index.py:51
Traceback (most recent call last):
File "/opt/homebrew/lib/python3.10/site-packages/langchain/document_loaders/text.py", line 41, in load
text = f.read()
File "/opt/homebrew/Cellar/python@3.10/3.10.12/Frameworks/Python.framework/Versions/3.10/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 10: invalid start byte
If there was a way to specify directories to ignore, I could tell it to stop traversing into directories like node_modules, vendor/gems and ext in these projects.
Metadata
Metadata
Assignees
Labels
No labels