ConsisTNet: A Spatio-Temporal Approach for Consistent Anatomical Localization in Endoscopic Pituitary Surgery

Paper | BibTex

Abstract

Automated localization of critical anatomical structures in endoscopic pituitary surgery is crucial for enhancing patient safety and surgical outcomes. While deep learning models have shown promise in this task, their predictions often suffer from frame-to-frame inconsistency. This study addresses this issue by proposing ConsisTNet, a novel spatio-temporal model designed to improve prediction stability. ConsisTNet leverages spatio-temporal features extracted from consecutive frames to provide both temporally and spatially consistent predictions, addressing the limitations of single-frame approaches. We employ a semi-supervised strategy, utilizing ground-truth label tracking for pseudo-label generation through label propagation. Consistency is assessed by comparing predictions across consecutive frames using predicted label tracking. The model is optimized and accelerated using TensorRT for realtime intraoperative guidance. Compared to previous state-of-the-art models, ConsisTNet significantly improves prediction consistency across video frames while maintaining high accuracy in segmentation and landmark detection. Specifically, segmentation consistency is improved by 4.56% and 9.45% in IoU for the two segmentation regions, and landmark detection consistency is enhanced with a 43.86% reduction in mean distance error. The accelerated model achieves an inference speed of 202 frames per second (FPS) with 16-bit floating point (FP16) precision, enabling realtime intraoperative guidance. ConsisTNet demonstrates significant improvements in spatio-temporal consistency of anatomical localization during endoscopic pituitary surgery, providing more stable and reliable realtime surgical assistance.

Limitation of Existing Methods

(a) Temporal sequence of critical anatomical structures during the sellar phase of eTSA. (b) An example of prediction inconsistency (from PitSurgRT) between video frame I_t-1 and I_t, with inconsistent predictions highlighted by white boxes.

Contributions

A novel network architecture, ConsisTNet, that integrates spatio-temporal information to reduce prediction volatility during eTSA.
A pseudo-label generation method based on CoTracker2 for temporal learning and consistency evaluation in the absence of ground truth data.
A detailed analysis of the impact of temporal learning on reducing prediction variability in pituitary surgical video sequences.
A real-time implementation of ConsisTNet to meet the performance requirements of intraoperative guidance.

Architecture

ConsisTNet architecture includes: a HRNet backbone; a temporal module; and dual heads. CoTracker2-generated pseudo-labels are utilized during training to enhance spatio-temporal learning.

Implementation and running

The proposed method is implemented using PyTorch 1.12.0 on Python 3.8.17.

To install Pytorch, please refer to Pytorch.

In our experiments, we install the following Pytorch version for GPU :

pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu116

'PituVideo_train_val.py' script is used to run the training and validation process.

'PituVideo_test_consist_eval.py' script is used to evaluate consistency.

Results

Performance and consistency evaluation (mean ± std, fivefold cross-validation)

Qualitative Evaluation

Qualitative comparison of prediction performance and consistency between the proposed and compared methods. The IoU for the focused area, sella, is displayed in the bottom right of each image.

Citation

If any part of our paper and repository is helpful to your work, please generously cite with:

@article{mao2025consistnet,
  title={ConsisTNet: a spatio-temporal approach for consistent anatomical localization in endoscopic pituitary surgery},
  author={Mao, Zhehua and Das, Adrito and Khan, Danyal Z and Williams, Simon C and Hanrahan, John G and Stoyanov, Danail and Marcus, Hani J and Bano, Sophia},
  journal={International Journal of Computer Assisted Radiology and Surgery},
  pages={1--10},
  year={2025},
  publisher={Springer}
}

Acknowledgement

This repo is largely based on HRNet-Semantic-Segmentation and CoTracker. Thanks for their excellent work.

Contact

If you have any questions or suggestions about this paper, feel free to contact me (z.mao@ucl.ac.uk).

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
docs		docs
experiments/pituitary		experiments/pituitary
lib		lib
.gitignore		.gitignore
PituVideo_test_consist_eval.py		PituVideo_test_consist_eval.py
PituVideo_train_val.py		PituVideo_train_val.py
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ConsisTNet: A Spatio-Temporal Approach for Consistent Anatomical Localization in Endoscopic Pituitary Surgery

Paper | BibTex

Abstract

Limitation of Existing Methods

Contributions

Architecture

Implementation and running

Results

Performance and consistency evaluation (mean ± std, fivefold cross-validation)

Qualitative Evaluation

Citation

Acknowledgement

Contact

About

Uh oh!

Releases

Packages

Languages

ZH-Mao/PitVideo

Folders and files

Latest commit

History

Repository files navigation

ConsisTNet: A Spatio-Temporal Approach for Consistent Anatomical Localization in Endoscopic Pituitary Surgery

Paper | BibTex

Abstract

Limitation of Existing Methods

Contributions

Architecture

Implementation and running

Results

Performance and consistency evaluation (mean ± std, fivefold cross-validation)

Qualitative Evaluation

Citation

Acknowledgement

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages