Skip to content

Conversation

@pymonger
Copy link

@pymonger pymonger commented Oct 10, 2025

🚀 Pull Request: NISAR Project Adaptation and Enhanced Metrics Extraction

📋 Overview

This PR introduces a comprehensive NISAR-specific adaptation of the HySDS metrics extractor, providing enhanced hierarchical job breakdown capabilities and specialized execution time analysis. The changes organize the codebase for scalability while maintaining backward compatibility with the original functionality.

🎯 Key Features

1. NISAR-Specific Enhancements

  • Hierarchical Job Breakdown: Three-level breakdown of SCIFLO_RSLC jobs by:
    • Primary: NISAR beam modes (e.g., L_40_DH_05_DH, L_20_QP_05_QP)
    • Secondary: Coverage type (full, partial)
    • Tertiary: Acquisition mode (individual, mixed)
  • Execution Time Analysis: Specialized wall_time metrics analysis with PCM container runtime tracking
  • Regex Pattern Matching: Configurable job ID parsing using NISAR beam mode patterns

2. Project Organization

  • Dedicated NISAR Directory: Clean separation of NISAR-specific adaptations
  • Project Template: Reusable template for future project adaptations
  • Quick Start Guide: Easy-to-follow usage examples
  • Comprehensive Documentation: Technical details and usage instructions

3. Code Quality Improvements

  • Import Path Fixes: Proper module imports for scripts in subdirectories
  • Code Reusability: Moved imports to enable function reuse across scripts
  • Security: Removed all hardcoded IP addresses and credentials
  • Clean History: Rewritten git history without sensitive information

📁 File Changes

File Change Type Description
.gitignore NEW Comprehensive ignore rules for Python, macOS, CSV files, and common artifacts
PROJECT_TEMPLATE.md NEW Template for creating new project adaptations
QUICK_START.md NEW Quick reference guide for all available scripts
nisar/ NEW Dedicated directory for NISAR-specific scripts and configurations
nisar/hysds_metrics_es_extractor_enhanced.py NEW Enhanced metrics extractor with hierarchical breakdown
nisar/job_execution_time_extractor.py NEW Specialized execution time analyzer
nisar/NISAR_MIXED_MODES_CONFIG_20200101T000000_01.json NEW NISAR beam mode patterns and configurations
nisar/README.md NEW NISAR project documentation
nisar/README_ENHANCED.md NEW Detailed technical documentation
metrics_extractor/hysds_metrics_es_extractor.py MODIFIED Moved imports to top for reusability

🔧 Technical Details

Enhanced Metrics Extractor

  • Three-Level Hierarchical Breakdown: beam_namecoverageacquisition_mode
  • Regex Pattern Matching: _(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_
  • Comprehensive Metrics: Average runtime, counts, instance types per breakdown level
  • CSV Export: Structured output with hierarchical columns

Execution Time Extractor

  • Wall Time Analysis: Extracts wall_time from job.job_info.metrics.usage_stats
  • Dual Metrics:
    • execution_time_minutes: Lesser of two wall_time values
    • pcm_container_runtime_m: Larger of two wall_time values
  • Statistical Analysis: Min, max, average execution times per breakdown level

Security & Best Practices

  • No Hardcoded Credentials: All authentication handled via runtime prompts
  • Placeholder URLs: Documentation uses generic placeholders instead of real IPs
  • Clean Git History: No sensitive information in commit history
  • Comprehensive .gitignore: Prevents accidental commits of generated files

🚀 Usage Examples

NISAR Enhanced Metrics

cd nisar/
python hysds_metrics_es_extractor_enhanced.py \
  -u https://your-es-instance/mozart_es/logstash-*/_search \
  -b 56 \
  --breakdown_job "job-SCIFLO_RSLC:pcm_r4.0.7_pge_r4.1.0" \
  --nisar_config NISAR_MIXED_MODES_CONFIG_20200101T000000_01.json

NISAR Execution Time Analysis

cd nisar/
python job_execution_time_extractor.py \
  -u https://your-es-instance/mozart_es/logstash-*/_search \
  -b 56 \
  --breakdown_job "job-SCIFLO_RSLC:pcm_r4.0.7_pge_r4.1.0"

📊 Output Files

  • Hierarchical Breakdown CSV: job_three_level_breakdown_job_SCIFLO_RSLC_*.csv
  • Execution Time CSV: job_execution_times_job_SCIFLO_RSLC_*.csv
  • Standard Metrics: Original functionality preserved

🔮 Future Adaptations

The project template (PROJECT_TEMPLATE.md) provides a clear pattern for adapting this structure to other projects:

  1. Create Project Directory: your_project/
  2. Copy Base Scripts: Enhanced extractor and execution time analyzer
  3. Update Configurations: Regex patterns, job types, and documentation
  4. Customize Analysis: Project-specific metrics and breakdowns

✅ Testing

  • Import Paths: Scripts work correctly from nisar/ directory
  • Authentication: Secure credential prompting implemented
  • CSV Generation: Files created in CWD as expected
  • Documentation: All examples use placeholder URLs
  • Git History: Clean history without sensitive information

📈 Impact

  • +3,520 lines added, -538 lines removed
  • 11 files changed
  • 7 commits with clean, logical progression
  • Zero breaking changes to existing functionality
  • Enhanced scalability for future project adaptations

🎯 Ready for Review

This PR provides a solid foundation for NISAR-specific analysis while establishing patterns for future project adaptations. All security concerns have been addressed, documentation is comprehensive, and the codebase maintains backward compatibility.


Branch: feature/nisar-adaptation
Target: main
Author: pymonger pymonger@gmail.com


Note

Introduces NISAR-specific metrics suite (3-level breakdown, execution-time extractors, PGE version comparison) with docs/template, while refactoring the core extractor for reuse and adding a .gitignore.

  • NISAR tooling (new nisar/):
    • Enhanced extractor: hysds_metrics_es_extractor_enhanced.py adds 3-level job breakdown by beam_name → coverage → acquisition_mode, aggregates metrics, and exports CSV.
    • Execution time analyzers: job_execution_time_extractor.py (hierarchical wall_time stats) and pge_execution_time_extractor.py (data-day filter, credential caching, CSV output).
    • Version comparison: compare_pge_versions.py generates Excel report comparing PGE versions.
    • Config/Docs: Adds NISAR mode config NISAR_MIXED_MODES_CONFIG_*.json and README files.
  • Docs & templates: Adds PROJECT_TEMPLATE.md and QUICK_START.md for adapting to new projects and quick usage.
  • Core extractor: Refactors metrics_extractor/hysds_metrics_es_extractor.py for reuse (imports/formatting), preserves existing behavior.
  • Misc: Adds comprehensive .gitignore.

Written by Cursor Bugbot for commit e08beba. This will update automatically on new commits. Configure here.

- Move json, logging, requests, sys, getpass, datetime, and urllib imports to top
- Remove duplicate imports from main block
- Enables functions to be imported and used by other scripts
- Move NISAR-specific scripts to dedicated nisar/ directory
- Create comprehensive project organization with hierarchical breakdown
- Add job execution time extractor with wall_time analysis
- Create PROJECT_TEMPLATE.md for future project adaptations
- Add QUICK_START.md for easy reference
- Clean up obsolete debugging and testing scripts
- Update all documentation to reflect new structure
- Add sys.path.append to correctly import from parent metrics_extractor directory
- Fixes ModuleNotFoundError when running scripts from nisar/ directory
- Both hysds_metrics_es_extractor_enhanced.py and job_execution_time_extractor.py updated
- CSV files are created in CWD when scripts are run
- Empty generated_csv_files directory was not needed
- Simplifies directory structure
…ctory

- CSV files are created in CWD when scripts run
- Remove unnecessary directory references from README.md
- Clean up directory structure documentation
- Ignore macOS .DS_Store files
- Ignore Python __pycache__ directories and compiled files
- Ignore CSV files generated by scripts
- Ignore common IDE, backup, and temporary files
- Ignore Python virtual environments and build artifacts
- Comprehensive coverage of commonly ignored files/directories
@pymonger pymonger requested a review from hookhua October 10, 2025 15:25
cursor[bot]

This comment was marked as outdated.

- Remove trailing underscore from regex pattern to handle job IDs that continue after beam name
- Pattern now matches: _full_individual_L_20_QP_05_QP_0_state-config-...
- Previously only matched: _full_individual_L_20_QP_05_QP_
- Fixes issue where INSAR jobs were not being processed due to strict regex
- Tested with example INSAR job ID: SCIFLO_INSAR__pcm_r4.0.7_pge_r4.1.0-network_pair_006_040_012_full_individual_L_20_QP_05_QP_0_state-config-20251010T201535.245443Z
```bash
cp /Users/gmanipon/dev/metrics_extractor/nisar/hysds_metrics_es_extractor_enhanced.py /Users/gmanipon/dev/metrics_extractor/your_project/
cp /Users/gmanipon/dev/metrics_extractor/nisar/job_execution_time_extractor.py /Users/gmanipon/dev/metrics_extractor/your_project/
```
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Template Contains Hardcoded Developer Paths

The PROJECT_TEMPLATE.md includes hardcoded personal development paths, like /Users/gmanipon/dev/metrics_extractor/. These paths make the template non-portable and expose specific developer environment details, which could hinder its general usability for other developers.

Fix in Cursor Fix in Web

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the final PR Bugbot will review for you during this billing cycle

Your free Bugbot reviews will reset on January 9

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

patterns = {
"beam_name": r"_(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_", # Primary: beam_name
"coverage": r"_(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_", # Secondary: coverage
"acquisition_mode": r"_(?P<coverage>full|partial)_(?P<acquisition_mode>individual|mixed)_(?P<beam_name>L_\d{2}_\w{2}_\d{2}_\w{2})_", # Tertiary: acquisition_mode
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Regex excludes S-band jobs, only matches L-band

The beam_name regex pattern uses L_\d{2}_\w{2}_\d{2}_\w{2} which only matches L-band modes (e.g., L_20_DH_05_DH), but the NISAR mission includes both L-band and S-band modes. The documentation in README_ENHANCED.md correctly states the regex should be [LS]_\d{2}_\w{2}_\d{2}_\w{2} to match both bands, and the load_nisar_config function also correctly uses this pattern. However, the parse_job_id_patterns function only matches L-band, causing all S-band job metrics (e.g., S_37_QP_00_NA) to be silently skipped. This affects both extractor scripts.

Additional Locations (1)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants