Skip to content

[BUG] Time out when querying snapshots for a specific time range #52

@Tiptop4792

Description

@Tiptop4792

Please select the according OS-Label

Ubuntu

Describe

I run into a time out when querying for a specific date range. It worked for a long time. Downloading entire collections still works. - Many thanks for looking into this.

Command to reproduce
Command: -u sueddeutsche.de -a -o /home/user/Documents/waybackup_snapshots/sueddeutsche-test --workers 6 --start 20260302 --end 20260303 --no-redirect

Terminal output

-------------------------
Version: 4.1.5
-------------------------
Command: -u sueddeutsche.de -a -o /home/user/Documents/waybackup_snapshots/sueddeutsche-test --workers 6 --start 20260302 --end 20260303 --no-redirect
-------------------------

2026-03-12 17:12:29
-------------------------
!-- Exception: 
Unknown error while querying cdx server
!-- File: waybackdownloaderEnvironment/lib/python3.10/site-packages/requests/adapters.py
!-- Function: send
!-- Line: 713
!-- Segment: raise ReadTimeout(e, request=request)
!-- Description: HTTPSConnectionPool(host='web.archive.org', port=443): Read timed out. (read timeout=60)
-------------------------
!-- Local Variables:
    -- self = <requests.adapters.HTTPAdapter object at 0x7e4c8e9b9d50>
    -- request = <PreparedRequest [GET]>
    -- stream = True
    -- timeout = Timeout(connect=60, read=60, total=None)
    -- verify = True
    -- cert = None
    -- proxies = OrderedDict()
    -- conn = HTTPSConnectionPool(host='web.archive.org', port=443)
    -- url = /cdx/search/cdx?output=json&url=sueddeutsche.de/*&from=20260302&to=20260303&fl=timestamp,digest,mimetype,statuscode,original
    -- chunked = False
-------------------------
Traceback (most recent call last):
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/connectionpool.py", line 534, in _make_request
    response = conn.getresponse()
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/connection.py", line 571, in getresponse
    httplib_response = super().getresponse()
  File "../../../../usr/lib/python3.10/http/client.py", line 1395, in getresponse
    response.begin()
  File "../../../../usr/lib/python3.10/http/client.py", line 323, in begin
    version, status, reason = self._read_status()
  File "../../../../usr/lib/python3.10/http/client.py", line 284, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "../../../../usr/lib/python3.10/socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "../../../../usr/lib/python3.10/ssl.py", line 1303, in recv_into
    return self.read(nbytes, buffer)
  File "../../../../usr/lib/python3.10/ssl.py", line 1159, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/requests/adapters.py", line 667, in send
    resp = conn.urlopen(
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/connectionpool.py", line 841, in urlopen
    retries = retries.increment(
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/util/retry.py", line 490, in increment
    raise reraise(type(error), error, _stacktrace)
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/util/util.py", line 39, in reraise
    raise value
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/connectionpool.py", line 787, in urlopen
    response = self._make_request(
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/connectionpool.py", line 536, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/urllib3/connectionpool.py", line 367, in _raise_timeout
    raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='web.archive.org', port=443): Read timed out. (read timeout=60)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/pywaybackup/files.py", line 130, in request_snapshots
    with requests.get(query.query_url, stream=True, timeout=60) as r:
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/requests/api.py", line 73, in get
    return request("get", url, params=params, **kwargs)
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/requests/api.py", line 59, in request
    return session.request(method=method, url=url, **kwargs)
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "waybackdownloaderEnvironment/lib/python3.10/site-packages/requests/adapters.py", line 713, in send
    raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='web.archive.org', port=443): Read timed out. (read timeout=60)


Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions