Conversation
Set up pytest with smoke tests covering CLI, POTENCI predictions, and single-entry pipeline computation. Fix .gitignore to use explicit patterns (the old `test*` glob would have excluded the new `tests/` directory). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace deprecated setup.py with modern pyproject.toml using hatchling as the build backend (PEP 621). Move pytest config into pyproject.toml and remove standalone pytest.ini. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Extract hardcoded data tables (centshifts, neicorrs, termcorrs, tempcoeffs, combdevs) to CSV files in trizod/potenci/data/ - Replace all eval() calls with float() and safe parsing - Remove dead code: getpredshifts_arr(), getphcorrs_arr(), main(), writeOutput() - Add 300-entry regression test suite with strategic sampling across filter levels, sequence lengths, and edge cases - Remove old ad-hoc test scripts (test/); add TODO for CheZOD equality test (needs reference data) - Fix .gitignore: /data/ for top-level only, add tests/bmrb_subset/ Verified: all predictions are bit-for-bit identical to original code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ruff config (E, W, F, I, N, UP, B, C4, SIM rules with scientific code exceptions). Format all modules and fix 32 lint issues: bare excepts, type() comparisons, collapsible ifs, %-formatting → f-strings, unused re-exports (__all__), lambda closure binding, raise-from. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Switch to uv as package manager (pyproject.toml deps, uv.lock) - Remove legacy POTENCI code (potenci1_3.py) - Rename all POTENCI functions/variables/constants to descriptive snake_case - Extract phshifts data to CSV, cache module-level data, vectorize rolling RMS - Add dedicated test_potenci.py, make all tests ruff-conformant - Add CLAUDE.md, POTENCI README, and architecture section to project README - Add docs/ to .gitignore Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move detailed filter tables and POTENCI docs out of README into docs/. Add pipeline walkthrough. Reorganize .gitignore and move internal planning notes to gitignored docs/_planning/. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Content-based cache keyed by hash(seq, T, pH, ion) avoids recomputing POTENCI predictions across pipeline runs. Includes precompute script for batch caching and regression test against baseline outputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove dead code (CSV I/O functions, pka_csv_path/identifier params) - Fix 3-residue pentamer crash (IndexError on short sequences) - Suppress harmless overflow/OptimizeWarning in pKa fitting - Extract BB_ATOMS constant, _SKIP_ATOM_PAIRS, _build_pentamer helper - Rename non-descriptive variables to descriptive names - Reorder file: constants → data loading → helpers → pKa → pH → API Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add both authors (Markus Haak, Tobias Senoner) with @tum.de emails. Update GitHub URL to MarkusHaak/trizod. Update POTENCI docs to reflect current API (removed pka_csv_path, renamed private functions, caching). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Migrate all path handling from os.path to pathlib.Path across the codebase. Rename non-descriptive variables in scoring.py (dAIC → delta_AIC, ashwi_ → abs_weighted_diffs, ol_ → outlier_mask, etc.), trizod.py (m → match, kw → keyword, fp → cache_path, scores_ → score_array, method_whitelist_ → whitelist_lower, etc.), and bmrb.py (pplist(l) → pplist(items), comprehension vars a/s → descriptive names). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add @pytest.mark.slow to full-dataset regression tests (4 tests that each run the entire pipeline on 17k entries). Default pytest run now takes ~47s instead of ~6min. Run all tests with: pytest -m "" Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…egex Remove 9 chemicals from strict chemical-denaturants that are not denaturants: reducing agents (DTT, BME, mercaptoethanol), NMR reference standards (DSS), and common buffers (acetic acid, CD3COOH, deuterated sodium acetate). Fix exp-method-blacklist by removing "state" keyword which incorrectly matched "solution-state" experiment subtypes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ails Add detailed architecture notes (key functions per module), pipeline flow (6 stages), caching system documentation, offset correction clarification (not re-referencing), and updated data paths. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pandas str-typed columns reject list assignment. Cast to object first to fix CSV output crash on full dataset runs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Modern NMR experiments commonly measure only 4 backbone shift types (HN, N, CO, CA) without HA. Requiring 5 types unnecessarily excluded many high-quality modern datasets. Code change in previous commit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Temperatures 1-14 K are physically impossible for liquid-state NMR (sample would be frozen). Values in this range are certainly Celsius. Only 3 conditions across 2 entries (bmr52889, bmr53214) are affected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Parse _Assembly.Paramagnetic and _Entity.Paramagnetic tags from BMRB NMR-STAR files. Exclude entries flagged as paramagnetic at tolerant+ tiers. Paramagnetic samples (iron proteins, lanthanide tags, etc.) cause 0.5-5+ ppm shift perturbations that make Z-score computation meaningless since POTENCI assumes diamagnetic conditions. Checks both assembly and entity level to catch 15 entries with contradictory Assembly=no/Entity=yes flags. ~183 entries affected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The parallel_apply returns mixed types — cast to bool explicitly to avoid TypeError on pandas invert operation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Pure refactoring and modernization of the entire codebase — no pipeline behavior changes. All regression tests pass against existing baselines.
What's included (12 commits):
setup.pytopyproject.tomlwith uveval()and dead code, clean up code structure, fix bugs, rename variables to descriptive namesos.pathwithpathlib, rename cryptic variables across all modulespipeline.md,potenci.md,filtering.md; trim README; update authors/emails/repo URLValueErrorraisesStats: 41 files changed, +6892 / −3461 lines
Test plan
uv run pytest tests/ -v— all 9 default tests passuv run pytest tests/ -v --run-slow— all 13 tests pass (including full-dataset regression)uv run ruff check trizod/ tests/— no lint issuesuv run ruff format --check trizod/ tests/— formatting clean🤖 Generated with Claude Code