open_mfdataarray for a large number of Febus files
#37
-
|
First of all, what a great library - thank you for it! I'm trying to index a large number of Febus files (12,550 files, each 1 GB, containing 1 minute of DAS data over 1,081 channels sampled at 4 kHz) using open_mfdataarray like this: The metadata fetching is fast (~157 it/s), which I noticed is thanks to parallelization. However, linking the data array still takes quite some time—even though it's a VirtualStack. The time needed starts at about 1.5 s/iteration at 0%, but increases steadily, reaching around 24 s/iteration at just 10% progress. Am I doing something wrong? Is there a way to speed up this second part? Thanks a lot for any tips or help! |
Beta Was this translation helpful? Give feedback.
Replies: 6 comments
-
|
Thank you for repporting this issue ! I might have an idea where things are getting slow. Have you tried to open one or a few files and investigate the timing information ? you can either use Nevertheless the aggregation should be faster, I need to change a few line of codes. Let me know if this is the case for you. |
Beta Was this translation helpful? Give feedback.
-
|
Might be related to #24. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @Linvill. I changed a little something in the fix/faster-interpcoord-append branch. If you want you can try to install that branch with
Tell me if it makes things faster. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @atrabattoni, Your changes worked - I can now index all 12,550 files. Thank you very much for the quick fix! In my data, I’m seeing gaps of 0.250112 ms instead of the expected 0.250 ms (at 4 kHz sampling rate), occurring roughly every second. I’ve worked around this by setting the tolerance to 0.26 ms. However, I’m also seeing 1.000250112 s gaps every other minute. I’ll now have to check how we’re processing the data in the first place… But yes, your tool has been extremely helpful in giving us an overview, and thanks to the indexing, we can now start working systematically on the dataset. Right now, I wonder what happens if we fetch data during a gap, guess I will find out soon!:) Thanks again! |
Beta Was this translation helpful? Give feedback.
-
|
Happy to know that it worked ! I will add this feature into version 0.2.3. |
Beta Was this translation helpful? Give feedback.
-
|
@Linvill : with @ClaudioStrumia we also realized that febus only provides timing information that are us accurate. When casting those to ns timestamps it creates those yyyy-mm-ddT:hh:mm:ss.000000017 kind of timestamps. We fixed this, and incorporate those changes in the Let me know if it further ease working with febus files. |
Beta Was this translation helpful? Give feedback.
Hi @Linvill.
I changed a little something in the fix/faster-interpcoord-append branch. If you want you can try to install that branch with
pip install git+https://github.com/xdas-dev/xdas.git@fix/faster-interpcoord-appendTell me if it makes things faster.