Skip to content

rasterio requires CURL_CA_BUNDLE environmental variable to open s3 paths on some systems #458

@jmilloy

Description

@jmilloy

Description
Rasterio fails to open s3 paths on some systems due to missing SSL certificates. The exception is completely misleading

According to rasterio/rasterio@b621d92, libcurl on linux expects the ssl certificates to be at the CentOS default /etc/pki/tls/certs/ca-bundle.crt, but on other systems they will be at other locations.

Steps to Reproduce

On an Ubuntu system:

import podpac
node = podpac.data.Rasterio(source='s3://noaa-gfs-pds/SOIM/0-10 m DPTH/20210315/1200/003')
node.dataset

You can produce the error without podpac as well:

import rasterio
session = rasterio.session.AWSSession(region_name='us-east-1', aws_unsigned=True)
with rasterio.env.Env(session=session) as env:
    dataset = rasterio.open('s3://noaa-gfs-pds/SOIM/0-10 m DPTH/20210315/1200/003')

Observed Behavior

WARNING:rasterio._env:CPLE_AppDefined in HTTP response code on https://noaa-gfs-pds.s3.amazonaws.com/SOIM/0-10%20m%20DPTH/20210315/1200/003.xml: 0
Traceback (most recent call last):
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 67, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 213, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_OpenFailedError: '/vsis3/noaa-gfs-pds/SOIM/0-10 m DPTH/20210315/1200/003' not recognized as a supported file format.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jmilloy/Creare/Pipeline/podpac/podpac/core/utils.py", line 398, in wrapper
    value = fn(self)
  File "/home/jmilloy/Creare/Pipeline/podpac/podpac/core/data/rasterio_source.py", line 78, in dataset
    dataset = rasterio.open(self.source)  # This should pull AWS credentials automatically
  File "/home/jmilloy/Creare/Pipeline/_podpac-38_/lib/python3.8/site-packages/rasterio/env.py", line 433, in wrapper
    return f(*args, **kwds)
  File "/home/jmilloy/Creare/Pipeline/_podpac-38_/lib/python3.8/site-packages/rasterio/__init__.py", line 221, in open
    s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  File "rasterio/_base.pyx", line 218, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: '/vsis3/noaa-gfs-pds/SOIM/0-10 m DPTH/20210315/1200/003' not recognized as a supported file format.

Additional Notes

Currently, podpac (and rasterio) have a sort of workaround that uses HTTP instead of HTTPS:

node = podpac.data.Rasterio(source='s3://noaa-gfs-pds/SOIM/0-10 m DPTH/20210315/1200/003', aws_https=False)

which translates to

session = rasterio.session.AWSSession(region_name='us-east-1', aws_unsigned=True)
with rasterio.env.Env(session=session, AWS_HTTPS=False) as env:
    dataset = rasterio.open('s3://noaa-gfs-pds/SOIM/0-10 m DPTH/20210315/1200/003')

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingenhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions