Skip to content

[BUG?] Ungraceful handling of blocking IO #181

@ghost

Description

Summary

So I've been playing with skew for a week or so now and whilst it principally works for what I want it to do at a functional level, I've found that something is causing it to handle blocking IO very poorly (not at all in fact)

The tests I've been running are to scan my own AWS account for S3 buckets. Skew works perfectly for finding the 3 buckets I expected it to, but then never returns.

At first I thought I was being impatient, so I decided to leave it run overnight and through the morning. The function still never returned. I understand that scanning all of AWS is a potentially non-trivial task, but more than 16 hours? Something must be up.

Investigation

The first thing I noticed trying to work out what was going on here was (when it worked) the dump from the keyboard interrupt when running in the console:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/ares/.local/lib/python3.8/site-packages/skew/arn/__init__.py", line 323, in __iter__
    for scheme in self.scheme.enumerate(context, **self.kwargs):
  File "/home/ares/.local/lib/python3.8/site-packages/skew/arn/__init__.py", line 247, in enumerate
    for provider in self._arn.provider.enumerate(
  File "/home/ares/.local/lib/python3.8/site-packages/skew/arn/__init__.py", line 232, in enumerate
    for service in self._arn.service.enumerate(
  File "/home/ares/.local/lib/python3.8/site-packages/skew/arn/__init__.py", line 217, in enumerate
    for region in self._arn.region.enumerate(
  File "/home/ares/.local/lib/python3.8/site-packages/skew/arn/__init__.py", line 198, in enumerate
    for account in self._arn.account.enumerate(
  File "/home/ares/.local/lib/python3.8/site-packages/skew/arn/__init__.py", line 145, in enumerate
    for resource in self._arn.resource.enumerate(
  File "/home/ares/.local/lib/python3.8/site-packages/skew/arn/__init__.py", line 127, in enumerate
    resources.extend(resource_cls.enumerate(
  File "/home/ares/.local/lib/python3.8/site-packages/skew/resources/aws/s3.py", line 27, in enumerate
    resources = super(Bucket, cls).enumerate(arn, region, account,
  File "/home/ares/.local/lib/python3.8/site-packages/skew/resources/resource.py", line 54, in enumerate
    data = client.call(enum_op, query=path, **kwargs)
  File "/home/ares/.local/lib/python3.8/site-packages/skew/awsclient.py", line 127, in call
    data = op(**kwargs)
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/client.py", line 386, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/client.py", line 691, in _make_api_call
    http, parsed_response = self._make_request(
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/client.py", line 711, in _make_request
    return self._endpoint.make_request(operation_model, request_dict)
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/endpoint.py", line 102, in make_request
    return self._send_request(request_dict, operation_model)
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/endpoint.py", line 134, in _send_request    success_response, exception = self._get_response(
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/endpoint.py", line 166, in _get_response    success_response, exception = self._do_get_response(
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/endpoint.py", line 200, in _do_get_response
    http_response = self._send(request)
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/endpoint.py", line 269, in _send
    return self.http_session.send(request)
  File "/home/ares/.local/lib/python3.8/site-packages/botocore/httpsession.py", line 344, in send
    urllib_response = conn.urlopen(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 665, in urlopen
    httplib_response = self._make_request(
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 421, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 416, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.8/http/client.py", line 1344, in getresponse
    response.begin()
  File "/usr/lib/python3.8/http/client.py", line 307, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.8/http/client.py", line 268, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib/python3.8/socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib/python3.8/ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib/python3.8/ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
KeyboardInterrupt
>>>

Diving into the codebase skew wraps around boto3 which in turn uses (to no massive surprise) urllib3 to manage its requests to AWS itself.

So cool, it's just urllib3 under the hood and it's timing out. Well urllib3 doesn't have a default timeout for requests, but you can certainly set one. So using socket.setdefaulttimeout(60) set the timeout to 60 seconds (pretty fair imo) and suddenly skew (or more specifically boto3) doesn't work at all. Even the original buckets that I expect to see listed there are not returned. What makes this slightly more confusing is that it doesn't actually return a timeout exception; I'm not even convinced the requests are made.

With messing with the timeout not working I thought, what if I can just force the thread to exit by force? So using various forms of inspiration from this thread, I tried forcing skew to exit early.

Context Manager

This context manager didn't work at all, skew just didn't stop

@contextmanager
def timeout(duration):
    def timeout_handler(signum, frame):
        raise BlockingIOError(f'Function timed out after {duration} seconds')

    signal.signal(signal.SIGALRM, timeout_handler)
    signal.alarm(duration)
    yield
    signal.alarm(0)

Thread Decorator

This thread decorator saw better results in that the quit_function call is actually made (the call to log.info is made) but again, skew doesn't exit gracefully and just hangs.

def quit_function(fn_name):
    log.info(f'{fn_name} took too long')
    thread.interrupt_main()  # raises KeyboardInterrupt


def exit_after(s):
    """
    use as decorator to exit process if
    function takes longer than s seconds
    """

    def outer(fn):
        def inner(*args, **kwargs):
            timer = threading.Timer(s, quit_function, args=[fn.__name__])
            timer.start()
            try:
                result = fn(*args, **kwargs)
            finally:
                timer.cancel()
            return result

        return inner

    return outer

I'm honestly running out of ideas as to where to go looking for this one. My use case is actually to be able to run skew on a serverless function endpoint that I can just call, but these have well defined timeouts that I just can't seem to get skew to obey.

Any ideas?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions