This document contains installation and usage instructions of the
bridge SciDB plugin.
Install extra-scidb-libs following the instructions here.
- Install the required packages:
- Ubuntu:
apt-get install cmake libcurl4-openssl-dev - RHEL/CentOS:
yum install libcurl-devel yum install https://downloads.paradigm4.com/devtoolset-3/centos/7/sclo/x86_64/rh/devtoolset-3/scidb-devtoolset-3.noarch.rpm yum install cmake3 devtoolset-3-runtime devtoolset-3-toolchain scl enable devtoolset-3 bash
- Ubuntu:
- Download and unzip the SDK:
wget --no-verbose --output-document - https://github.com/aws/aws-sdk-cpp/archive/1.8.3.tar.gz \ | tar --extract --gzip --directory=. - Configure the SDK:
> cd aws-sdk-cpp-1.8.3 aws-sdk-cpp-1.8.3> mkdir build aws-sdk-cpp-1.8.3/build> cd build- Ubuntu:
aws-sdk-cpp-1.8.3/build> cmake .. \ -DBUILD_ONLY=s3 \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_INSTALL_PREFIX=/opt/aws - RHEL/CentOS:
aws-sdk-cpp-1.8.3/build> cmake3 .. \ -DBUILD_ONLY=s3 \ -DCMAKE_BUILD_TYPE=RelWithDebInfo \ -DBUILD_SHARED_LIBS=ON \ -DCMAKE_INSTALL_PREFIX=/opt/aws \ -DCMAKE_C_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/gcc \ -DCMAKE_CXX_COMPILER=/opt/rh/devtoolset-3/root/usr/bin/g++
- Ubuntu:
- Compile and install the SDK:
The SDK will be installed in
aws-sdk-cpp-1.8.3/build> make aws-sdk-cpp-1.8.3/build> make install/opt/aws
- Apache Arrow library version
0.16.0is required. The easiest way to install it is by running:wget -O- https://paradigm4.github.io/extra-scidb-libs/install.sh \ | sudo sh -s -- --only-prereq - Install Apache Arrow development library:
- Ubuntu
apt-get install libarrow-dev=0.16.0-1 - RHEL/CentOS
yum install arrow-devel-0.16.0
- Ubuntu
Compile cURL with OpenSSL (instead of NSS):
> curl https://curl.haxx.se/download/curl-7.72.0.tar.gz | tar xz
> ./configure --prefix=/opt/curl
> make
> make install
More details: aws/aws-sdk-cpp#1491
- Checkout and compile the plug-in:
> git clone https://github.com/Paradigm4/bridge.git bridge> make - Install in SciDB:
bridge> cp libbridge.so /opt/scidb/19.11/lib/scidb/plugins - Restart SciDB and load the plug-in:
scidbctl.py stop mydb scidbctl.py start mydb iquery --afl --query "load_library('bridge')"
pip install scidb-bridge
- AWS uses two separate filed to configure the S3 client. The
credentialsfile is required and stores the AWS credentials for accessing S3, e.g.:The> cat credentials [default] aws_access_key_id = ... aws_secret_access_key = ...configfile is optional and stores the region for the S3 bucket. By default theus-east-1region is used, e.g.:> cat config [default] region = us-east-1 - In SciDB installations these two files are located in
/home/scidb/.awsdirectory, e.g.:> ls /home/scidb/.aws config credentials
Note: The credentials used need to have read/write permission to the S3 bucket used.
- Save SciDB array in S3:
The SciDB array is saved in the
> iquery --afl AFL% xsave( filter( apply( build(<v:int64>[i=0:9:0:5; j=10:19:0:5], j + i), w, double(v*v)), i >= 5 and w % 2 = 0), 's3://p4tests/bridge/foo'); {chunk_no,dest_instance_id,source_instance_id} valp4testsbucket in thefooobject. - Load SciDB array from S3 in Python:
> python >>> import scidbbridge >>> ar = scidbbridge.Array('s3://p4tests/bridge/foo') >>> ar.metadata {'attribute': 'ALL', 'compression': None, 'format': 'arrow', 'index_split': '100000', 'namespace': 'public', 'schema': '<v:int64,w:double> [i=0:9:0:5; j=10:19:0:5]', 'version': '1'} >>> print(ar.schema) <v:int64,w:double> [i=0:9:0:5; j=10:19:0:5] >>> ar.read_index() i j 0 5 10 1 5 15 >>> ch = ar.get_chunk(5, 15) >>> ch.to_pandas() v w i j 0 20 400.0 5 15 1 22 484.0 5 17 2 24 576.0 5 19 ...
Note: If using the file system for storage, make sure the storage is
shared across instances and that the path used by the non-admin SciDB
users is in io-paths-list in SciDB config.ini.
It is common for S3 to return Access Denied for non-obvious cases
like, for example, if the bucket specified does not exist. xsave
includes an extended error message for this type of errors which
include a link to a troubleshooting guide. E.g.:
> iquery -aq "xsave(build(<v:int64>[i=0:9], i), bucket_name:'foo', object_path:'bar')"
UserException in file: PhysicalXSave.cpp function: uploadS3 line: 372 instance: s0-i1 (1)
Error id: scidb::SCIDB_SE_ARRAY_WRITER::SCIDB_LE_UNKNOWN_ERROR
Error description: Error while saving array. Unknown error: Upload to
s3://foo/bar failed. Access Denied. See
https://aws.amazon.com/premiumsupport/knowledge-center/s3-troubleshoot-403/.