Skip to content
/ cellnet Public

Cellnet. The solution for automated cell counting with low cost and high throughput. Developed for Welcome Sanger Institute together with the Universities of Tartu and Göttingen.

License

Notifications You must be signed in to change notification settings

beijn/cellnet

Repository files navigation

Cell Counting Collaboration

Collaboration between the Wellcome Sanger Institute in Cambridge, the Computational Cell Analytics Group at the University of Göttingen and the Biomedical Computer Vision Lab at the University of Tartu for

Localizing Cells in Phase-Contrast Microscopy Images using Sparse and Noisy Center-Point Annotations

Based on Benjamin Eckhardt's Bachelor's Thesis in Computer Science.

If you use this work please credit the original author (Benjamin Eckhardt, 2024) and cite the Thesis.

CellNet Documentation

Note: All instructions are relative to the repo root.

Using the Release

Installation

pip install git+https://github.com/beijn/cellnet

Usage via Command Line

python cellnet/release.py [list of image (/folder) paths]

Will store the counts in counts.json and images with the predictions in plots/.

Usage as a Python Module

from cellnet import init_model, count
model = init_model(<version>)
counts, plots = count(<list_of_image_file_descriptors>, model)
  • counts and plots are image-path indexed dictionaries.
  • <version>: the model version
    • 'latest': will download and cache the latest compatible model to ~/.cache/cellnet
    • None: will use what ever is already in the cache or default to 'latest'
    • any other string will download that version from the GitHub releases

Trouble Shooting

Can't automatically determine latest model version due to GitHub API rate limiting

  • provide a manual version string to init_model(<version>)
  • or provide a GITHUB_TOKEN environment variable with a GH PAT with scope Content: read (eg. in ./_GITIGNORE/sercrets.sh)

Training and Releasing

Workflow Overview:

  1. Setup
  2. Data Setup
  3. Training
  4. Releasing

Setup

git clone git@github.com:beijn/cellnet.git
cd cellnet
git switch draft  # optional: development branch
mircomamba create -yf cellnet.yml || conda env create -yf cellnet.yml
micromanba activate cellnet || conda activate cellnet
bash bin/install  # editable pip install of the cellnet package

Data Setup

pre_images.py for the images in data/images creates binary foreground/background masks in data/cache/fgmasks which help during training. Currently uses an unlicensed third-party model, which will not adapt to new image modalities and should be replaced. Their purpose is to include background in the training data, whereas foreground regions without point annotations are excluded. Please inspect the generated masks manually and update the fgmask_status column in data/data_quality.csv accordingly with ok or BAD.

pre_points.py converts the label-studio <some name>..<ANNOTATOR>.json point annotations in data to [x,y,label-id] arrays in data/cache/points. The annotators contributing to an image are saved to data/cache/annotators.json. Merging annotations of different annotators is currently not implemeted. Draft code for filtering only 'agreeing' annotations is in utility/disagreeing-annotations.py. Please manually update the annotation_status column in data/data_quality.csv accordingly with empty, sparse, fully, NISSING or CONFLICT.

Images with BAD masks, which are neither empty nor fully annotated are excluded from training. As well as images with NISSING or CONFLICTing annotations.

Training

bin/submit-job [remote] sbatch|local|bsub <notebook> <experiment> <release-mode>

bin/submit-job creates a snapshot copy of the current repository state and submits a job to run the <notebook> with the specified <experiment> and <release-mode> settings. The resulting notebook and all its outputs are saved under results/<notebook>/STARTDATE-<experiment>-<release-mode>.

  • if remote, the notebook will be run on the remote cluster
  • sbatch will submit a job to run the notebook to slurm's sbatch
  • bsub will submit a job to run the notebook to LSF's bsub
  • local will run the notebook in the current shell
  • <notebook> should be without the .py or .ipynb extension
  • <experiment> defines experiment settings in train.py
  • <release-mode>: draft: for interactive sessions; release: create release artifacts; crossval: perform cross-validation; everything else defaults to crossval and can therefore be used as free-form tag

The actual job is defined in bin/job-ipynb. It activates a matching conda environment with the help of bin/init_conda, executes the notebook with the specified settings using jupytext, and afterwards cleans up the snapshot copy for all files that have not been modified or created during the job (aka results and release artifacts).

Slurm-cluster specific settings are defined in the #SBATCH header of bin/job-ipynb and (are overwritten with) the SBATCH_ARGS defined via source bin/init_shell.sh.

To modify the train.py please consider parameterizing the new functionality via a setting in train.CFG and adding a corresponding experiment setting in train.EXPERIMENTS.

Examples

  • bin/submit-job remote sbatch train sigma crossval will run the train notebook with the sigma experiment settings in cross-validation mode with slurm's sbatch on the remote cluster.
  • bin/submit-job local train default release will run the train notebook with the default settings in the current shell and save the model release artifacts.

Releasing

bin/release <path_to_model> [<version-tag>]

Will package the release artifacts for the model in the specified path and push it to the GitHub releases with the name <model_api_version>-<version-tag>.

  • The model API version is drawn from cellnet.__init__.__model_api_version__.
  • Pushing a release requires gh, the GitHub CLI to be installed and authenticated.

Examples

  • bin/release results/train/250229-246060-default-release will create a release named 2-250229
  • bin/release results/train/250229-246061-xnorm_per_channel-release dynnorm will create a release named 2-250229-dynnorm

Future Directions

Also refer to the Thesis

Machine Learning Research

  • replace the method to generate the foreground masks in pre_images.py with something that can be easily adapted to new datasets.
  • merge agreeing/disagreeing annotations (see draft code in utility/disagreeing-annotations.py)
    • only include aggrement annotations
    • subtract neighborhoods of disagreeing annotations from the 24px radius positive point mask
  • compression-less image file format
  • explore more sophisticated model architectures
  • implement a better loss function, count-based loss, or best: MESA from Lempitsky et al., 2010.
  • better heatmap with repulsive distributions to better distinguish individual cells (has to keep norm property)
  • label smoothing?
  • blob detection for discretized localization

Software Engineering

  • replace segmentation_models_pytorch with a better maintained and further developed model library
  • switch from segmentation_models_pytorch's save/load to onnx for superiour portability and flexibility in decoupling training and deployment, less release dependencies
  • consider switching from the custom bin/submit-job and bin/job-ipynb with experiments configuration in the notebook to Hydra or so for greater standardization
  • consider switching from the custom train.py training loop to MosaicML' Composer or Lightning or so for more low-hanging fruits it optimizing the training process

About

Cellnet. The solution for automated cell counting with low cost and high throughput. Developed for Welcome Sanger Institute together with the Universities of Tartu and Göttingen.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •