Skip to content

Conversation

@guillaume-osmo
Copy link
Collaborator

adding new method based on proximity hierarchiy key.

- Remove duplicate main() function (dead code)
- Fix JSON output format for backward compatibility with benchmark history
- Document valid benchmark state with verified metrics
- Random split: Key-LOO BAcc=0.8234, Dummy BAcc=0.8048
- Scaffold split: Key-LOO BAcc=0.8380, Dummy BAcc=0.8252

This commit establishes a known-good state for benchmark metrics.
DO NOT modify benchmark code without explicit approval and verification.
CRITICAL FIX: Running both random and scaffold splits in the same execution
caused state leakage, resulting in incorrect metrics (e.g., 84% BAcc).

Now runs only ONE split at a time:
- --split random: runs random split only
- --split scaffold: runs scaffold split only

To compare: run separately and compare outputs manually.

This prevents any state leakage between runs and ensures accurate metrics.
- Update version to 1.8.0 in setup.py, pyproject.toml, __init__.py, and build script
- Remove DEBUG messages from Key-LOO rescaling (unconditional cout statements)
- Clean up unused debug_count variable

Changes:
- Version bump: 1.7.0 -> 1.8.0
- Removed DEBUG 1D key messages that were printing unconditionally
- Code cleanup: removed unused debug_count variable
- Add set_proximity_mode() for NCM mode selection
- Add set_proximity_params() for hierarchical proximity configuration
- Add set_proximity_amplitude() for target-aware amplitude scaling
- Add set_proximity_amp_components_policy() for 2D/3D component handling
- Add set_proximity_amp_distance_beta() for distance decay
- Add set_statistical_backoff() for rare key handling
- Add set_verbose() for debug output control
- Add get_n_tasks() method
- Add comprehensive NCM performance report with CV5 std results

NCM provides:
- 66.9% valid models (vs 23.3% for dummy_masking)
- Competitive AUC: 0.931-0.936 (CV5: 0.9319±0.0134)
- No data leakage (uses only training data)
- Production-ready performance with low variance
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants