-
Notifications
You must be signed in to change notification settings - Fork 273
Open
Description
Command that causes the issue
When scanning a (randomly-found) open dir with french accents:
$ dirhunt "http://freeit.free.fr/"
Welcome to Dirhunt v1.0.0 using Python 3.12.3
[ERROR] Error on CommonCrawl source: 503 Server Error: Service Temporarily Unavailable for url: https://index.commoncrawl.org/collinfo.json
◐ Started now
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/dirhunt/exceptions.py", line 47, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dirhunt/crawler_url.py", line 84, in start
processor.process(text, soup)
File "/usr/local/lib/python3.12/dist-packages/dirhunt/processors.py", line 351, in process
self.search_keywords(text)
File "/usr/local/lib/python3.12/dist-packages/dirhunt/processors.py", line 102, in search_keywords
text = text.decode('utf-8')
^^^^^^^^^^^^^^^^^^^^
◐ Started 2 seconds ago
File "/usr/local/lib/python3.12/dist-packages/dirhunt/exceptions.py", line 47, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dirhunt/crawler_url.py", line 84, in start
processor.process(text, soup)
File "/usr/local/lib/python3.12/dist-packages/dirhunt/processors.py", line 351, in process
self.search_keywords(text)
File "/usr/local/lib/python3.12/dist-packages/dirhunt/processors.py", line 102, in search_keywords
text = text.decode('utf-8')
^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 522: invalid continuation byte
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/dirhunt/exceptions.py", line 47, in wrapped
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/dirhunt/crawler_url.py", line 84, in start
[200] http://freeit.free.fr/Elasticity/ (Index Of) (Nothing interesting)
File "/usr/local/lib/python3.12/dist-packages/dirhunt/processors.py", line 351, in process
self.search_keywords(text)
File "/usr/local/lib/python3.12/dist-packages/dirhunt/processors.py", line 102, in search_keywords
text = text.decode('utf-8')
^^^^^^^^^^^^^^^^^^^^
Expected behavior
UTF8 should be handled
Actual behavior
Crash due to UTF8 mis-handling
Traceback
No response
Dirhunt version
v1.0.0
Operating system (including distribution name and version)
Linux Ubuntu
Other details
No response
Checklist
- The error is in the project's code, and not in my own.
- I have searched for this issue before posting it and there isn't an open duplicate.
- I ran
pip install -U dirhuntand triggered the bug in the latest version.
Metadata
Metadata
Assignees
Labels
No labels