Collaboration between the Wellcome Sanger Institute in Cambridge, the Computational Cell Analytics Group at the University of Göttingen and the Biomedical Computer Vision Lab at the University of Tartu for
Localizing Cells in Phase-Contrast Microscopy Images using Sparse and Noisy Center-Point Annotations
Based on Benjamin Eckhardt's Bachelor's Thesis in Computer Science.
If you use this work please credit the original author (Benjamin Eckhardt, 2024) and cite the Thesis.
Note: All instructions are relative to the repo root.
pip install git+https://github.com/beijn/cellnet
python cellnet/release.py [list of image (/folder) paths]Will store the counts in counts.json and images with the predictions in plots/.
from cellnet import init_model, count
model = init_model(<version>)
counts, plots = count(<list_of_image_file_descriptors>, model)countsandplotsare image-path indexed dictionaries.<version>: the model version'latest': will download and cache the latest compatible model to~/.cache/cellnetNone: will use what ever is already in the cache or default to'latest'- any other string will download that version from the GitHub releases
- provide a manual version string to
init_model(<version>) - or provide a
GITHUB_TOKENenvironment variable with a GH PAT with scope Content: read (eg. in ./_GITIGNORE/sercrets.sh)
Workflow Overview:
git clone git@github.com:beijn/cellnet.git
cd cellnet
git switch draft # optional: development branch
mircomamba create -yf cellnet.yml || conda env create -yf cellnet.yml
micromanba activate cellnet || conda activate cellnet
bash bin/install # editable pip install of the cellnet package- Create and populate the
datafolder. - Interactively run
pre_images.pyandpre_points.pyto prepare the data for the model.
pre_images.py for the images in data/images creates binary foreground/background masks in data/cache/fgmasks which help during training. Currently uses an unlicensed third-party model, which will not adapt to new image modalities and should be replaced. Their purpose is to include background in the training data, whereas foreground regions without point annotations are excluded. Please inspect the generated masks manually and update the fgmask_status column in data/data_quality.csv accordingly with ok or BAD.
pre_points.py converts the label-studio <some name>..<ANNOTATOR>.json point annotations in data to [x,y,label-id] arrays in data/cache/points. The annotators contributing to an image are saved to data/cache/annotators.json. Merging annotations of different annotators is currently not implemeted. Draft code for filtering only 'agreeing' annotations is in utility/disagreeing-annotations.py. Please manually update the annotation_status column in data/data_quality.csv accordingly with empty, sparse, fully, NISSING or CONFLICT.
Images with BAD masks, which are neither empty nor fully annotated are excluded from training. As well as images with NISSING or CONFLICTing annotations.
bin/submit-job [remote] sbatch|local|bsub <notebook> <experiment> <release-mode>bin/submit-job creates a snapshot copy of the current repository state and submits a job to run the <notebook> with the specified <experiment> and <release-mode> settings. The resulting notebook and all its outputs are saved under results/<notebook>/STARTDATE-<experiment>-<release-mode>.
- if
remote, the notebook will be run on the remote cluster sbatchwill submit a job to run the notebook to slurm's sbatchbsubwill submit a job to run the notebook to LSF's bsublocalwill run the notebook in the current shell<notebook>should be without the.pyor.ipynbextension<experiment>defines experiment settings in train.py<release-mode>:draft: for interactive sessions;release: create release artifacts;crossval: perform cross-validation; everything else defaults to crossval and can therefore be used as free-form tag
The actual job is defined in bin/job-ipynb. It activates a matching conda environment with the help of bin/init_conda, executes the notebook with the specified settings using jupytext, and afterwards cleans up the snapshot copy for all files that have not been modified or created during the job (aka results and release artifacts).
Slurm-cluster specific settings are defined in the #SBATCH header of bin/job-ipynb and (are overwritten with) the SBATCH_ARGS defined via source bin/init_shell.sh.
To modify the train.py please consider parameterizing the new functionality via a setting in train.CFG and adding a corresponding experiment setting in train.EXPERIMENTS.
bin/submit-job remote sbatch train sigma crossvalwill run thetrainnotebook with thesigmaexperiment settings in cross-validation mode with slurm's sbatch on the remote cluster.bin/submit-job local train default releasewill run thetrainnotebook with thedefaultsettings in the current shell and save the model release artifacts.
bin/release <path_to_model> [<version-tag>]Will package the release artifacts for the model in the specified path and push it to the GitHub releases with the name <model_api_version>-<version-tag>.
- The model API version is drawn from
cellnet.__init__.__model_api_version__. - Pushing a release requires
gh, the GitHub CLI to be installed and authenticated.
bin/release results/train/250229-246060-default-releasewill create a release named2-250229bin/release results/train/250229-246061-xnorm_per_channel-release dynnormwill create a release named2-250229-dynnorm
Also refer to the Thesis
- replace the method to generate the foreground masks in
pre_images.pywith something that can be easily adapted to new datasets. - merge agreeing/disagreeing annotations (see draft code in
utility/disagreeing-annotations.py)- only include aggrement annotations
- subtract neighborhoods of disagreeing annotations from the 24px radius positive point mask
- compression-less image file format
- explore more sophisticated model architectures
- implement a better loss function, count-based loss, or best: MESA from Lempitsky et al., 2010.
- better heatmap with repulsive distributions to better distinguish individual cells (has to keep norm property)
- inspire by force field of two electrons in physics? (retains norm property)
- inverse distance: https://davidborland.github.io/webpage/pdfs/MICCAI%20MOVI%202022.pdf (damages nice gaussian norm property)
- label smoothing?
- blob detection for discretized localization
- replace segmentation_models_pytorch with a better maintained and further developed model library
- switch from segmentation_models_pytorch's save/load to onnx for superiour portability and flexibility in decoupling training and deployment, less release dependencies
- consider switching from the custom
bin/submit-jobandbin/job-ipynbwithexperimentsconfiguration in the notebook to Hydra or so for greater standardization - consider switching from the custom
train.pytraining loop to MosaicML' Composer or Lightning or so for more low-hanging fruits it optimizing the training process