ArrayMorph

ArrayMorph is a software to manage array data stored on cloud object storage efficiently. It supports both HDF5 C++ API and h5py API. The data returned by h5py API is numpy arrays. By using h5py API, users can access array data stored on the cloud and feed the read data into machine learning pipelines seamlessly.

Tag: CI4AI

How-To Guides

Install dependencies

It is recommended to use Conda (and conda-forge) for managing dependencies.

Install Miniconda
Install conda-build for installing local conda packages

Create and activate environment with dependencies:

conda create -n arraymorph conda-forge::gxx=9
conda activate arraymorph
conda install -n arraymorph cmake conda-forge::hdf5=1.14.2 conda-forge::aws-sdk-cpp conda-forge::azure-storage-blobs-cpp conda-forge::h5py

Install ArrayMorph via ArrayMorph local conda package

git clone https://github.com/ICICLE-ai/arraymorph.git
cd arraymorph/arraymorph_channel
conda index .
conda install -n arraymorph arraymorph -c file://$(pwd) -c conda-forge

Install ArryMorph from source code

Build ArrayMorph

git clone https://github.com/ICICLE-ai/arraymorph.git
cd arraymorph/arraymorph
cmake -B ./build -S . -DCMAKE_PREFIX_PATH=$CONDA_PREFIX
cd build
make

Enable VOL plugin:

export HDF5_PLUGIN_PATH=/path/to/arraymorph/arraymorph/build/src
export HDF5_VOL_CONNECTOR=arraymorph

Configure Environment for Cloud Access

AWS Configuration:

export STORAGE_PLATFORM=S3
export BUCKET_NAME=XXXXXX
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export AWS_REGION=us-east-2  # or your bucket's region

Azure Configuration:

export STORAGE_PLATFORM=Azure
export BUCKET_NAME=XXXXXX
export AZURE_STORAGE_CONNECTION_STRING=XXXXXX

Tutorials

Run a simple example: Writing and Reading HDF5 files from Cloud

Prerequisites:

AWS or Azure cloud account with credentials
S3 bucket or Azure container
ArrayMorph dependencies installed

Steps:

Activate conda environment
```
conda activate arraymorph
```
Write sample HDF5 data to the cloud
```
cd examples/python
python3 write.py
```
Read data back from cloud HDF5 file
```
cd examples/python
python3 read.py
```

Explanation

How ArrayMorph Works

ArrayMorph plugs into the HDF5 stack using a VOL (Virtual Object Layer) plugin that intercepts file operations and routes them to cloud object storage instead of local files. This allows existing HDF5 APIs (both C++ and h5py in Python) to operate on cloud-based data seamlessly, enabling transparent cloud access for scientific or ML pipelines.

It supports:

Cloud backends: AWS S3 and Azure Blob
File formats: Current binary data stream (we plan to extend to other formats like jpg in the future)
Languages: C++ and Python (via h5py compatibility)

The system is designed to be efficient in latency-sensitive scenarios and aims to integrate well with large-scale distributed training and inference.

References

Acknowledgements

This project is supported by:

National Science Foundation (NSF) funded AI institute for Intelligent Cyberinfrastructure with Computational Learning in the Environment (ICICLE) (OAC 2112606)

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
.github		.github
arraymorph		arraymorph
arraymorph_channel/linux-64		arraymorph_channel/linux-64
examples/python		examples/python
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
release.yaml		release.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ArrayMorph

How-To Guides

Install dependencies

Install ArrayMorph via ArrayMorph local conda package

Install ArryMorph from source code

Build ArrayMorph

Enable VOL plugin:

Configure Environment for Cloud Access

AWS Configuration:

Azure Configuration:

Tutorials

Run a simple example: Writing and Reading HDF5 files from Cloud

Prerequisites:

Steps:

Explanation

How ArrayMorph Works

References

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

ICICLE-ai/ArrayMorph

Folders and files

Latest commit

History

Repository files navigation

ArrayMorph

How-To Guides

Install dependencies

Install ArrayMorph via ArrayMorph local conda package

Install ArryMorph from source code

Build ArrayMorph

Enable VOL plugin:

Configure Environment for Cloud Access

AWS Configuration:

Azure Configuration:

Tutorials

Run a simple example: Writing and Reading HDF5 files from Cloud

Prerequisites:

Steps:

Explanation

How ArrayMorph Works

References

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages