Feature/vj-notebook by VarenyaJ · Pull Request #28 · VarenyaJ/P5

VarenyaJ · 2025-07-24T12:45:34Z

Refactor project; start notebook from scratch

…experiment-notebook Starting new work on the Trello ticket, continuing work started in "rr/model-notebook"

…'P5/scripts/data/tmp/phenopacket_dataset.csv' from 'P5/scripts/readme.md'

…till not ready

…in each PMID PDF. Adjust some variable names and comments

…y __main__ and expose pdf-parse as its own entry point - pyproject.toml: - set `p5` entry point to `P5.scripts.__main__:cli` (the full scripts CLI) - added `p5-pdf-parse` console script alias entry for users to run, giving direct access to the pdf-parse helper - kept optional sub-tool entry points (pull-git-files, create-pmid-pkl, etc.) - adjusted keywords and comments for clarity - src/P5/__main__.py: - replaced the old standalone click group with a deprecation shim - running `python -m P5` now prints a friendly message directing users to `p5`

…single source of truth

…nd preview

…nity

…ts via project utils

…, and wrapping util

…ed phenopacket

…icted phenopackets

… Report

… ontology-based metrics

… the event of an HTPP 500 error: - file_to_phenopacket: • Always request JSON from Ollama (`format="json"`) • Fallback: if model output is not valid JSON, write a minimal valid Phenopacket scaffold so one JSON file is produced per input • Ensure output directory exists before writing • Added RFC3339 timestamp helper for `metaData.created` - pmid_downloader: • Hardened _get_pmcid() with retry/backoff for transient NCBI 5xx errors • Return None (skip PMID) instead of crashing on Entrez/HTTP errors • Added Entrez.email env override + graceful handling of failures These changes ensure: - `test_file_to_phenopacket[.pdf]` produces expected JSON outputs - `test_pmid_downloader` no longer fails on NCBI server errors - CLI commands always exit 0 with sensible output, even in edge cases

…d upgrade model while still revising - Added stable `id` fields for all notebook cells to improve reproducibility - Step 4: added `list_pmids_loaded` to track actually loaded PMIDs, updated patient ID fallback - Step 5: improved minimal Phenopacket builder - skip invalid HPOs - ensure non-null labels - use UTC ISO8601 timestamps in `metaData.created` - Temporarily upgraded model from `llama3.2:latest` → `gpt-oss:latest` - Removed deprecated `--hidethinking` option; added `num_ctx` for larger context - Step 6 & 7: use `list_pmids_loaded` instead of indexing into dataframe - Step 8: updated evaluator `model` field to `ollama:gpt-oss:latest`

…d HPO extraction - Added debug print of input char counts per PMID. - Relaxed HPO JSON schema (require only `hpo_id`). - Rewrote HPO prompt instructions for onset/severity/frequency + excerpts. - Added fallback when schema-based JSON parse fails. - Updated execution counts & timestamps.

@SmartMonkey-git

…xt` to `requirements_scripts.txt` as per @SmartMonkey-git's request

SmartMonkey-git and others added 30 commits June 19, 2025 10:49

Push notebook

6aa6bd0

Notebook update

e09450c

Merge remote-tracking branch 'origin/rr/model-notebook' into feature/…

ed3b9f9

…experiment-notebook Starting new work on the Trello ticket, continuing work started in "rr/model-notebook"

Update README.md from main

45cdc5d

Installing jupyterlab via pip

02fd92d

start setting up jupyter notebook

f11c637

Update README.md from main

20843dd

Add warning to the scripts README

7ea31e6

Ignore Mac-specific files generated by Apple

b1341a0

Adding .idea/ files

ba4e183

Trying to work on setting up the python notebook but cannot generate …

052fa53

…'P5/scripts/data/tmp/phenopacket_dataset.csv' from 'P5/scripts/readme.md'

Fix Pycharm to show Project Root

4789f13

Clean up imports

7d14887

Writing things a little out of order, trying to lay the groundwork

1bb1118

Writing things a little out of order, trying to lay the groundwork. S…

5aa7894

…till not ready

Forgot to import Report class

49e0f6b

Setting up Steps 0 and 1 again

c685976

Rename some variables for clarity. Continue with steps 2-4

de9f1e5

Edited 2 debug print statements

1503109

Edited 2 debug print statements

3b1decc

Rewrite step 5

7a4399f

Write up a draft of Steps 6 and 7

4f88318

Write up a rough draft of Steps 6 and 7

b5a6a7a

Pass only three directories to 'create_phenopacket_dataset'

58c7c5a

Pass only three paths to 'create_phenopacket_dataset'

73fa772

Fix typo

9cf05bc

Add comment about potentially over-aggressive deduplication

8615b08

Still adjusting so this writes one phenopacket per patient described …

90d4a28

…in each PMID PDF. Adjust some variable names and comments

Add HPO-specific prompt and Phenopacket-specific prompt

6379020

Add missing quotation marks

2168178

VarenyaJ and others added 30 commits August 18, 2025 12:52

Revise package: switch to Hatch + scripts CLI; remove setuptools files

ddd359f

Add a Quickstart pip section to the README.md; revise other sections

98ef16b

update gitignore

c04af3e

Fix capitalization

672938a

Add TOOO: reference template repo for how to use the requirements as …

8a98299

…single source of truth

feat(step1): load dataset CSV, validate columns, deduplicate PMIDs, a…

583ca71

…nd preview

chore(step2): enumerate phenopacket-store JSONs for visibility and sa…

91659c1

…nity

feat(step3): add docling PDF→text conversion with persistent cache

3bd8852

feat(step4): load clinical texts and validate ground-truth phenopacke…

7291ae3

…ts via project utils

feat(step5): add strict HPO extraction prompt, schema, robust parsing…

bb834cc

…, and wrapping util

test(step6): run single-case HPO extraction and build minimal predict…

4cfe6ec

…ed phenopacket

feat(step7): batch-extract HPOs, save raw LLM output and minimal pred…

514160d

…icted phenopackets

feat(step8): evaluate predictions via PhenotypeEvaluator and generate…

7efec12

… Report

feat(step9): persist final evaluation Report to JSON

dd891dc

docs(optional): add commented semantic similarity skeleton for future…

05ca70d

… ontology-based metrics

Finish adding skeleton code for revising the demonstration notebook

e8a029a

Finish adding skeleton code for revising the demonstration notebook

80d2de5

Add comments and docstrings while dealing with the HTTP 500 error

7bae929

Add comments and docstrings while dealing with the HTTP 500 error

99b8bdf

Downgrade from gpt-oss:latest

e16186c

(configuration) CI: Use Ruff instead of Black

f02c899

Ruff check and formatting

0ca28a9

Update .gitignore to skip the generated files

822522e

Move jupyterlab==4.4.4 and ipykernel==6.29.5 from `requirements.t…

08ee283

…xt` to `requirements_scripts.txt` as per @SmartMonkey-git's request

Rename and edit notes on Charité blocking conda

b6c71c6

Edit and rename the notes on conda blockages again

47b544b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/vj-notebook#28

Feature/vj-notebook#28
VarenyaJ wants to merge 166 commits intomainfrom
feature/vj-notebook

VarenyaJ commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

VarenyaJ commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants