Conversation
Summary of ChangesHello @joelridden, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant enhancement to the seismic data processing pipeline by implementing a multi-event detection mechanism using STA/LTA algorithms. It also includes a substantial refactoring effort to optimize FDSN inventory fetching, reducing redundant API calls and improving overall performance. The new multi-event scores are integrated into the data merging and quality control processes, allowing for more robust filtering of seismic records. Additionally, the changes improve the system's resilience to FDSN API rate limits and expand configuration options. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a new multi-event detection feature, refactors inventory handling for performance, and updates configuration and data processing workflows. Key changes include adding a new multi_event.py module for STA/LTA trigger detection and event synchronization checks, and integrating these checks into the waveform extraction and quality filtering processes. The nzgmdb/config/config.yaml file now includes new parameters for STA/LTA windows and weights, and channel_codes are updated to include 'BH?'. Inventory fetching across several modules (distances.py, snr.py, process_observed.py, waveform_manipulation.py, geonet.py, sites.py, mseed_to_gmprocess.py, reading.py) has been optimized by passing pre-fetched Inventory objects or specifying level='station' where appropriate, reducing redundant FDSN calls. The waveform_extraction.py module now handles FDSNTooManyRequestsException with retries, passes event catalogs and extraction tables more efficiently, and collects multi-event scores. The merge_flatfiles.py module is updated to incorporate the new multi-event data into the final flatfile. In quality_db.py, the multi-event filtering logic has been updated to use the new stalta_score and sync_event fields, and the filter_duplicate_channels function now includes 'BH' channels. Review comments highlighted an incorrect syntax for catching multiple exceptions, an inverted logic for filtering multi-events based on stalta_score, a missing configuration key (multi_score_min), an unhandled case for selecting the 'Z' component in sync_event_from_stream, an incorrect type hint in a docstring, and a potential side effect from in-place DataFrame modification.
There was a problem hiding this comment.
Pull request overview
This pull request implements multi-event detection functionality to identify seismic waveforms that may contain multiple seismic events. The implementation uses STA/LTA (Short-Term Average/Long-Term Average) trigger detection combined with a synchronous event check to flag potentially problematic records.
Changes:
- Adds new
multi_event.pymodule with STA/LTA trigger detection and synchronous event checking - Renames
filter_multi_meantofilter_multi_eventwith updated filtering logic based on multi-event scores - Updates waveform extraction to compute and store multi-event scores for each record
- Modifies response removal to use
remove_response()instead ofremove_sensitivity()with improved inventory handling - Adds dynamic
ds_std_multipliercalculation based on hypocentral distance
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| nzgmdb/data_processing/multi_event.py | New module implementing STA/LTA trigger detection and synchronous event checking |
| nzgmdb/data_processing/quality_db.py | Renames filter function and updates logic to use multi-event scores |
| nzgmdb/data_retrieval/waveform_extraction.py | Integrates multi-event scoring during extraction, adds retry logic for catalog fetching |
| nzgmdb/data_processing/waveform_manipulation.py | Updates response removal to use remove_response() with better inventory handling |
| nzgmdb/phase_arrival/run_phasenet.py | Updates response removal consistent with waveform_manipulation changes |
| nzgmdb/data_retrieval/sites.py | Changes merge from left to outer join and improves duplicate handling |
| nzgmdb/data_processing/merge_flatfiles.py | Adds multi-event data to flatfile merging process |
| nzgmdb/management/file_structure.py | Adds MULTI_EVENT_TABLE to file structure enums |
| nzgmdb/config/config.yaml | Adds multi-event configuration parameters |
| tests/test_quality_filters.py | Updates test to use new filter_multi_event function |
| tests/quality_db_testing.csv | Adds stalta_score and sync_event columns to test data |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ds_std_multiplier = config.get_value("ds_std_multiplier") | ||
|
|
||
| # Compute the ds multiplier time | ||
| ds_std_multiplier = 0.8 / (1 + np.exp(-0.035 * (r_hyp - 140))) + 2.2 |
There was a problem hiding this comment.
Would be good to see a reference for this equation, ideally extracted as a function.
There was a problem hiding this comment.
I tried to get an answer to this from Aaron as he developed this himself by looking at data, have got nothing. I imagine it will come out in the paper he will write, but that will be a while away.
There was a problem hiding this comment.
Just add a placeholder like [Publication under way : Aaron Rampersad et al]?
There was a problem hiding this comment.
I'm happy with the comment that is there for now, provided we update it later.
# Conflicts: # nzgmdb/data_processing/waveform_manipulation.py # nzgmdb/data_retrieval/waveform_extraction.py # nzgmdb/phase_arrival/run_phasenet.py
This pull request introduces a new multi-event detection feature based on STA/LTA (Short-Term Average/Long-Term Average) triggers, integrates it into the data processing and quality control pipeline, and updates related configuration and data handling. The changes add a new module for multi-event scoring, update the merging and filtering logic to include these scores, and revise configuration and preprocessing details to support the new workflow.
Multi-Event Detection and Scoring:
multi_event.pymodule implementing multi-event detection using STA/LTA triggers, including functions for computing scores and synchronizing events from waveform data.config.yaml) to include parameters for STA/LTA-based multi-event detection, such as window lengths, thresholds, and component weights.Data Processing Pipeline Updates:
merge_flatfiles.pyto read, filter, and merge multi-event scores into the main ground motion flatfile, and to output a multi-event table as part of the merged results. [1] [2] [3] [4] [5]quality_db.pyto use the new multi-event score and sync event flag for filtering records, replacing the previous multi-mean approach. [1] [2] [3] [4] [5] [6]Waveform Preprocessing Improvements:
remove_responseinstead ofremove_sensitivity, ensuring correct handling of channel and response information. [1] [2] [3]Site Table and Metadata Handling:
Miscellaneous:
ds_std_multiplierin waveform extraction to use a dynamic formula based on hypocentral distance.ds_std_multiplierin config).These changes collectively enhance the detection and filtering of multi-event records, improving the reliability of the ground motion database.