Reading from FORGE AWS data lake? #563
-
|
Folks, I'm a newcomer to dascore (Hi!) but a mathematical geophysicist who is an experienced Python developer. I'm trying to retrieve the Neubrex DAS data underlying this image (found in a pdf contained in the forge-das-processing GitHub: .)
I think those data reside in an AWS "data lake" for the FORGE project listed on OpenEI. Is there some supported URI to hook up dascore to those data, so that I can use dascore's services to slice and dice through the (90+ TB!!!) of data in the data lake? If not, could some kind soul PLEASE help with outlining a workflow to accomplish that, so that I can store the relevant (neubrex?) hdf5 files locally? Any help you might be able to provide to this newbie to dascore would be greatly appreciated!!! (I hope to have an abstract for the Stanford Geothermal Workshop ready in a few days, based upon those data. I think that the DAS community will be pleasantly surprised at the results of this work!) TIA (especially for this very impressive software project)! Frank Horowitz |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
Hi @fghorow, We use dascore to manage fairly large datasets, I think the largest was around 60 TB, and it looks like the spool indexer could handle much more. These, however, were on a local NAS. We don't (yet) have a great way to work with data in an S3 bucket directly. However, if you can mount the S3 bucket to behave like a file system in a cloud environment (eg with Amazon EFS or FSx ), the you should be able to use DASCore's However, if you only need a few files, you can browse the s3 bucket and download what you need locally. This will require a lot less setup. Best of luck! |
Beta Was this translation helpful? Give feedback.

Hi @fghorow,
We use dascore to manage fairly large datasets, I think the largest was around 60 TB, and it looks like the spool indexer could handle much more. These, however, were on a local NAS. We don't (yet) have a great way to work with data in an S3 bucket directly. However, if you can mount the S3 bucket to behave like a file system in a cloud environment (eg with Amazon EFS or FSx ), the you should be able to use DASCore's
spoolmethod as shown in the docs. You can then index (spool.update) and select/iterate through the contents as you need.However, if you only need a few files, you can browse the s3 bucket and download what you need locally. This will require a lot less setup.
B…