Conversation
Summary of ChangesHello @lispandfound, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust solution for compressing large velocity models, drastically reducing their storage footprint without compromising the resolution needed for exploratory analysis. By leveraging quantization and Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces a new utility to compress velocity models for archival storage, leveraging xarray and h5py. This is a valuable addition for managing large datasets efficiently. The implementation correctly handles quantization and uses appropriate compression settings. I've identified a few areas for improvement regarding error handling, attribute management, and potential performance optimizations in the get_extrema function.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
claudio525
left a comment
There was a problem hiding this comment.
Just some nitpicking, looks good!
Co-authored-by: Claudio <45545396+claudio525@users.noreply.github.com>
…g into vm_compression
This scripts implements the lessons from the successful xyts compression stage to the velocity model outputs, allowing us to save all velocity models in a compressed format for archival storage in high Rho/Vp/Vs resolution with more than 45x compression ratios possible.
I picked two reasonably small velocity models: a 9gb and 16gb VM and Cesar's Darfield VM (98GB) and compressed them with this code:
Assuming this ratio scales to the larger models (which does appear to be true of the xyts compression code), we can expect to represent the full AlpineF2K VM from the Cybershake 100m in less than 3GB at resolutions similar to above. This resolution of 20-30m/s is more than enough to produce spatial plots and debug VM issues in the case where a result looks bad.
The Vp, Vs, and Rho resolutions are dynamically calculated using two variables:
Then the range of the quality divided by 255 is the scale set for the quantisation. By aggressively picking a small datatype for the final model, we further reduce redundant bits in the output. I deemed the extra$\sim 2^8$ factor of resolution not worth at least doubling the size of the compressed velocity model given this is used for exploratory work but I'm open to changing this.
The compressed model is saved as an xarray dataset, which provides compatibility with the rest of the workflow and transparently handles decompression without researchers writing any additional code, i.e.,
This works without having to decompress explicitly, making it easy for researchers to extract vs profiles from the actual model they simulated with.