Draft
Conversation
…experiment-notebook Starting new work on the Trello ticket, continuing work started in "rr/model-notebook"
…'P5/scripts/data/tmp/phenopacket_dataset.csv' from 'P5/scripts/readme.md'
…in each PMID PDF. Adjust some variable names and comments
…y __main__ and expose pdf-parse as its own entry point - pyproject.toml: - set `p5` entry point to `P5.scripts.__main__:cli` (the full scripts CLI) - added `p5-pdf-parse` console script alias entry for users to run, giving direct access to the pdf-parse helper - kept optional sub-tool entry points (pull-git-files, create-pmid-pkl, etc.) - adjusted keywords and comments for clarity - src/P5/__main__.py: - replaced the old standalone click group with a deprecation shim - running `python -m P5` now prints a friendly message directing users to `p5`
…single source of truth
…ts via project utils
…, and wrapping util
…icted phenopackets
… ontology-based metrics
… the event of an HTPP 500 error:
- file_to_phenopacket:
• Always request JSON from Ollama (`format="json"`)
• Fallback: if model output is not valid JSON, write a minimal valid
Phenopacket scaffold so one JSON file is produced per input
• Ensure output directory exists before writing
• Added RFC3339 timestamp helper for `metaData.created`
- pmid_downloader:
• Hardened _get_pmcid() with retry/backoff for transient NCBI 5xx errors
• Return None (skip PMID) instead of crashing on Entrez/HTTP errors
• Added Entrez.email env override + graceful handling of failures
These changes ensure:
- `test_file_to_phenopacket[.pdf]` produces expected JSON outputs
- `test_pmid_downloader` no longer fails on NCBI server errors
- CLI commands always exit 0 with sensible output, even in edge cases
…d upgrade model while still revising - Added stable `id` fields for all notebook cells to improve reproducibility - Step 4: added `list_pmids_loaded` to track actually loaded PMIDs, updated patient ID fallback - Step 5: improved minimal Phenopacket builder - skip invalid HPOs - ensure non-null labels - use UTC ISO8601 timestamps in `metaData.created` - Temporarily upgraded model from `llama3.2:latest` → `gpt-oss:latest` - Removed deprecated `--hidethinking` option; added `num_ctx` for larger context - Step 6 & 7: use `list_pmids_loaded` instead of indexing into dataframe - Step 8: updated evaluator `model` field to `ollama:gpt-oss:latest`
…d HPO extraction - Added debug print of input char counts per PMID. - Relaxed HPO JSON schema (require only `hpo_id`). - Rewrote HPO prompt instructions for onset/severity/frequency + excerpts. - Added fallback when schema-based JSON parse fails. - Updated execution counts & timestamps.
…xt` to `requirements_scripts.txt` as per @SmartMonkey-git's request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refactor project; start notebook from scratch