Gentl

The source code for Gentl (GENeTic aLgorithm for predicting stage from medical scans of patients with cancer) [access preprint].

About

This is a repository that contains information on how to reproduce results corresponding to the bladder cancer case study reported in Paper title.

Abstract

This is a repository that contains information on how to reproduce results corresponding to the bladder cancer case study reported in Paper title.

Data

Description

As described in our paper, the data used for our analyses comprised a total of 100 CT scans of the bladder, each from a patient with bladder cancer.
Disease: urothelial carcinoma of the bladder
Stages: Ta, Tis, T0, T1, T2, T3, T4
Stage annotation technique: Performed manually by radiologists

For more details, interested readers are directed to the Dataset section of the paper.

Availability

Data will be made available under reasonable request to the corresponding author, Suryadipto Sarkar (more contact details below).

Data preprocessing

Step 1: Segmenting bladder region using ImageJ

Step 2: Segmenting cancer ROI using binary mask

Step 3: Segmenting healthy ROIs using sliding window

Feature extraction using Gray level co-occurrence matrix (GLCM)

The following five GLCM features were extracted from the cancer ROI, as well as healthy ROIs from the same patient:

Dissimilarity
- Measures how different neighboring pixel values are.
- Assumptions:
  - Cancer region – high dissimilarity
  - Healthy region – low dissimilarity
Correlation
- Measures the linear dependency between pixel intensities in a given direction.
- Assumptions:
  - Cancer region – low correlation
  - Healthy region – high correlation
Energy
- Measures texture uniformity or repetition.
- Assumptions:
  - Cancer region – low energy
  - Healthy region – high energy
Contrast
- Measures the intensity variation between neighboring pixels in an image.
- Assumptions:
  - Cancer region – high contrast
  - Healthy region – low contrast
Homogeneity
- Measures whether neighboring pixels in an image have similar intensity values.
- Assumptions:
  - Cancer region – low homogeneity
  - Healthy region – high homogeneity

using $20$ configurations ($4$ angles: ${0, \frac{\pi}{4}, \frac{\pi}{2}, \frac{3\pi}{4} }$; $5$ distances: ${1, 2, 3,4, 5}$ pixels).

Feature binarization

Performed on the cancer ROI using bimodal Gaussian mixture model (GMM) fitting using the sklearn.mixture package. All feature values that are closer in Euclidean distance to lower mean ($\mu_1$) is assigned a value of $0$, else $1$ if closer to higher mean ($\mu_2$).
All healthy ROI feature values from the same image sample are assigned a value of $0$ if they are closer in Euclidean distance to the lower mean ($\mu_1$) obtained from the cancer ROI above, else assigned a value of $1$ if closer to $\mu_2$.

Note: Bimodal GMM fitting only done once per image, pertaining to the cancer ROI. Feature binarization of healthy ROIs performed based on mean values obtained from bimodal GMM fitting on the cancer ROI pertaining to the same image sample.

Genetic algorithm

General information about our implementation of the algorithm

We perform the genetic algorithm on each sample image separately.

An overview of the terms gene, chromosome and population

Algorithmic workflow

Step 1: Population initialization

The initial population ($P$) comprises binarized GLCM features extracted from the healthy ROIs.
Reported results include $P={10, 20, 30, 40, 50}$.

Step 2: Parent selection by fitness evaluation

Fitness metric: Euclidean or Absolute distance to target.
- In our implementation target is binarized feature list from cancer ROI.
Parent selection rate: 50% of the population at the end of iteration i is retained as parents for iteration i+1. Therefore, list of selected parents contains top 50% of the chromosomes closest to the target sequence.

Step 3: Crossover (initial offspring generation)

For crossover between two parents:
- The first parent ($p_1$) is always chosen from the top 50% of chromosomes (that is, ones having least Absolute distance to the target sequence).
- The second parent ($p_2$) is chosen from the initial population at each iteration.
Random portions of parents $p_1$ and $p_2$ constitute the respective offspring—with at least one gene compulsorily selected from each parent { $p_1$ , $p_2$ }.

Step 4: Mutation (final offspring generation)

Initial offspring $\overline{o_{1,2}}$ generated from parents $p_1$ and $p_2$ in step 3 (crossover) described above, undergoes mutation to give rise to final offspring $o_{1,2}$.

Step 5: Replacement

In this step, we replace the worst-performing individuals in the current population with new offspring, retaining the better-performing individuals.
In the script /gentl/_ga_step5_replacement.py:
- Input parameters:
  - population: The current population.
  - new_generation: The new generation of chromosomes.
  - goal: The target sequence.
- Returns:
  - best_individuals: The updated population containing the best individuals.

Installation

Create a virtual environment and install the dependencies using requirements.txt

python -m venv gentlvenv
gentlvenv\Scripts\activate # Windows
source gentlvenv/bin/activate # Mac/Linux
pip install -r requirements.txt

Key metrics for cancer staging

Using Genetic Algorithm (GA), we compute three key metrics to quantify feature similarity between non-cancer and cancer regions, which are essential for cancer staging:

average_generation:
The average number of iterations required for GA convergence (averaged over 20 runs).
- Lower values indicate faster convergence and higher similarity between non-cancer and cancer features.
average_best_distance:
The average distance between the best solution (most similar to cancer) and the target cancer feature (final generation, averaged over 20 runs).
- Lower values suggest that some non-cancer regions closely resemble cancer regions.
average_mean_distance:
The average distance between the whole population and the target (final generation, averaged over 20 runs).
- Reflects how well the entire population approximates cancer-like features, indicating diversity and convergence quality.

All processed results are stored in the folder: glcm_bladder_average_gentl_result/

This folder contains five subfolders, corresponding to experiments with different numbers of healthy ROIs.

Robustness tests

We designed tests to evaluate whether GA remains stable and reliable under different settings:

Random initialization test:
Run GA under 10 different random seeds, each repeated 20 times, to assess whether convergence speed and time are consistent across random initializations.
Mutation rate test:
Test GA performance under various mutation rates (0.001, 0.05, 0.1, 0.15, 0.9) to analyze how mutation influences diversity and convergence.
Population capacity (Np) test:
Examine how different offspring population sizes (25, 26, 28, 30, 40, 50, 60) affect GA's ability to avoid local optima and achieve stable solutions.

These tests ensure that GA performs consistently without being overly sensitive to random factors or hyperparameters.

Scalability tests

We analyzed how GA scales when facing larger or more complex feature sets:

Chromosome Length Test:
Vary chromosome lengths (10 to 70) to examine how problem size impacts convergence time and iterations.
Max Generation Limit Test:
Vary the maximum number of allowed generations (35 to 80) to confirm whether GA typically converges before hitting the limit.
Population Size Test:
Vary population sizes (20, 40, 60, 80, 100) to evaluate trade-offs between diversity and runtime efficiency.

These tests validate that GA remains feasible and efficient even as the complexity of the feature space increases.

Citing the work

MLA

Will be made available upon publication.

APA

Will be made available upon publication.

BibTex

Will be made available upon publication.

Contact

✉  suryadipto.sarkar@fau.de
✉  ssarka34@asu.edu
✉  ssarkarmanipal@gmail.com

Impressum

Suryadipto Sarkar ("Surya"), MS

PhD Candidate
Biomedical Network Science Lab
Department of Artificial Intelligence in Biomedical Engineering (AIBE)
Friedrich-Alexander University Erlangen-Nürnberg (FAU)
Werner von Siemens Strasse
91052 Erlangen

MS in CEN from Arizona State University, AZ, USA.
B.Tech in ECE from MIT Manipal, KA, India.

Name		Name	Last commit message	Last commit date
Latest commit History 194 Commits
.idea		.idea
caner_grading_correlation_classification		caner_grading_correlation_classification
classification_methods		classification_methods
data		data
feature_extraction_methods/__pycache__		feature_extraction_methods/__pycache__
gaussian_mixture_model		gaussian_mixture_model
gentl		gentl
glcm_average_gentl_results		glcm_average_gentl_results
glcm_bladder_average_gentl_results		glcm_bladder_average_gentl_results
scripts		scripts
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
Gentl-icon.jpeg		Gentl-icon.jpeg
Project Report(Shengyang, hu51waky).docx		Project Report(Shengyang, hu51waky).docx
Project Report(Vineetha,ec91ijos).docx		Project Report(Vineetha,ec91ijos).docx
README.md		README.md
fig2-bladder-paper.png		fig2-bladder-paper.png
github-crossover.jpeg		github-crossover.jpeg
github-gene-chromosome-population.jpeg		github-gene-chromosome-population.jpeg
github-gentl-feature-binarization.jpeg		github-gentl-feature-binarization.jpeg
github-gmm-individual-patients.jpeg		github-gmm-individual-patients.jpeg
github-mutation.jpeg		github-mutation.jpeg
github-otsus-thresholding.jpeg		github-otsus-thresholding.jpeg
github-overall-roi-bounding-box-selection.jpeg		github-overall-roi-bounding-box-selection.jpeg
github-pictorial-description-of-genetic-algorithm.jpeg		github-pictorial-description-of-genetic-algorithm.jpeg
github_ImageJ_data.png		github_ImageJ_data.png
github_cancer_roi.png		github_cancer_roi.png
github_healthy_roi.png		github_healthy_roi.png
requirements.txt		requirements.txt
run_gentl.py		run_gentl.py
spatial-to-network-domain.jpeg		spatial-to-network-domain.jpeg

Folders and files

Latest commit

History

Repository files navigation

Gentl

About

Abstract

Data

Description

Availability

Data preprocessing

Step 1: Segmenting bladder region using ImageJ

Step 2: Segmenting cancer ROI using binary mask

Step 3: Segmenting healthy ROIs using sliding window

Feature extraction using Gray level co-occurrence matrix (GLCM)

Feature binarization

Genetic algorithm

General information about our implementation of the algorithm

An overview of the terms gene, chromosome and population

Algorithmic workflow

Step 1: Population initialization

Step 2: Parent selection by fitness evaluation

Step 3: Crossover (initial offspring generation)

Step 4: Mutation (final offspring generation)

Step 5: Replacement

Installation

Key metrics for cancer staging

Robustness tests

Scalability tests

Citing the work

MLA

APA

BibTex

Contact

Impressum

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages