Reading from FORGE AWS data lake? #563

fghorow · 2025-10-04T15:09:39Z

fghorow
Oct 4, 2025

Folks,

I'm a newcomer to dascore (Hi!) but a mathematical geophysicist who is an experienced Python developer. I'm trying to retrieve the Neubrex DAS data underlying this image (found in a pdf contained in the forge-das-processing GitHub: .)

I think those data reside in an AWS "data lake" for the FORGE project listed on OpenEI.

Is there some supported URI to hook up dascore to those data, so that I can use dascore's services to slice and dice through the (90+ TB!!!) of data in the data lake? If not, could some kind soul PLEASE help with outlining a workflow to accomplish that, so that I can store the relevant (neubrex?) hdf5 files locally?

Any help you might be able to provide to this newbie to dascore would be greatly appreciated!!! (I hope to have an abstract for the Stanford Geothermal Workshop ready in a few days, based upon those data. I think that the DAS community will be pleasantly surprised at the results of this work!)

TIA (especially for this very impressive software project)!

Frank Horowitz

Answered by d-chambers

Oct 4, 2025

Hi @fghorow,

We use dascore to manage fairly large datasets, I think the largest was around 60 TB, and it looks like the spool indexer could handle much more. These, however, were on a local NAS. We don't (yet) have a great way to work with data in an S3 bucket directly. However, if you can mount the S3 bucket to behave like a file system in a cloud environment (eg with Amazon EFS or FSx ), the you should be able to use DASCore's spool method as shown in the docs. You can then index (spool.update) and select/iterate through the contents as you need.

However, if you only need a few files, you can browse the s3 bucket and download what you need locally. This will require a lot less setup.

B…

View full answer

d-chambers · 2025-10-04T17:11:13Z

d-chambers
Oct 4, 2025
Maintainer

Hi @fghorow,

We use dascore to manage fairly large datasets, I think the largest was around 60 TB, and it looks like the spool indexer could handle much more. These, however, were on a local NAS. We don't (yet) have a great way to work with data in an S3 bucket directly. However, if you can mount the S3 bucket to behave like a file system in a cloud environment (eg with Amazon EFS or FSx ), the you should be able to use DASCore's spool method as shown in the docs. You can then index (spool.update) and select/iterate through the contents as you need.

However, if you only need a few files, you can browse the s3 bucket and download what you need locally. This will require a lot less setup.

Best of luck!

1 reply

fghorow Oct 4, 2025
Author

Hi Derrick,

I appreciate your (very fast!) answer! I hadn't thought of mounting the S3 bucket locally.

In the interim, I've scrolled through the .sgy files in the data lake, and think I've found the one I need (or at least am in the ballpark). If that doesn't pan out, I'll try your trick of mounting it as a local filesystem and use dascore's services to deal with the bookkeeping!

THANKS!
Frank Horowitz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reading from FORGE AWS data lake? #563

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Reading from FORGE AWS data lake? #563

Uh oh!

fghorow Oct 4, 2025

Replies: 1 comment · 1 reply

Uh oh!

d-chambers Oct 4, 2025 Maintainer

Uh oh!

fghorow Oct 4, 2025 Author

fghorow
Oct 4, 2025

Replies: 1 comment 1 reply

d-chambers
Oct 4, 2025
Maintainer

fghorow Oct 4, 2025
Author