This pipeline uses the google/medgemma-27b-text-it model to classify cancer surgery procedure descriptions into purpose categories.
From within a virtual environment, run
mamba create -n classifier python=3.10
mamba activate classifier
pip install -Ue .
- Agree to the terms of use at https://huggingface.co/google/medgemma-27b-text-it.
- Login to huggingface-hub by running
huggingface-cli login. Create a token in huggingface by clicking on your profile picture on the upper right hand corner, thenAccess Token>Create New Token>Read>Create Token
- GPU with sufficient memory (recommended: 60GB+ VRAM i.e. A100/H100)
- around 200GB slurm memory
- Run the testing pipeline. This will run the job config complate in
/task_test. The test will also plot a performance curve to help determine the optimal batch size for the full run.
python test.pyRuntime curve on Nvidia H100, bfloat16 without quantization

- Run the full classification pipeline. For sample job templates, refer to
/task_surgery_classificationand/task_radiation_site:
python run_task.py task_surgery_classification- iris cluster:
sbatch run_job.sh task_surgery_classification- In thinking mode, model might fail to terminate gracefully with an answer.
- If the input string is too short, under-specified, or hard to parse, the mode might fail to recognise it as relevant medical information. It can stall and ask for more info instead of performing the desired classification task
- The model can hallucinate categories that are not user-specified. A second pass (refer to the refinement stage of this pipeline) typically resolves this.
It is always good practice to manually validate the model's output before using it elsewhere.
- Prompt tuning with https://github.com/stanfordnlp/dspy
