Saliency-GFT: A Novel Approach to Vocal Condition Classification

This repository contains the official implementation for Saliency-GFT, a novel, saliency-driven Gradient-based Feature Tayloring (GFT) architecture for classifying laryngeal and neurological vocal conditions from mel-spectrogram images. Our proposed method achieves state-of-the-art results on our dataset, with the best model reaching 85.14% test accuracy.

📖 Abstract

The classification of vocal pathologies from audio signals is a critical task in medical diagnostics. This project introduces a novel two-pass, saliency-driven training mechanism for Vision Transformers called Saliency-GFT. Unlike previous GFT methods that rely on the internal spatial structure of attention maps, our approach uses the true loss gradient ($$\frac{\partial L}{\partial \text{attn}}$$) to identify and prune the least salient patch tokens. This forces the model to focus on the most discriminative regions of the mel-spectrogram. Through comprehensive benchmarking, we demonstrate that our method consistently outperforms standard baselines. Furthermore, we present a novel hybrid backbone fusing CoAtNet and DINOv2, which, when combined with our Saliency-GFT method, achieves the highest performance, highlighting the benefits of both our training methodology and architectural design.

✨ Key Features

Saliency-Driven Patch Selection (GALA+PPS): A novel two-pass training mechanism that uses back-propagated loss gradients to intelligently prune and re-weight transformer patch tokens.
Hybrid Backbone Architecture: An experimental model that successfully infuses CoAtNet blocks into a DINOv2 backbone to demonstrate architectural synergy.
Comprehensive Benchmarking: Rigorous comparison against a reference GFT implementation and other strong transformer baselines.
Reproducible Results: All code for training and evaluation is provided to ensure full reproducibility of our findings.

📊 Results

Our experiments show a clear performance hierarchy, with our proposed models significantly outperforming the baselines.

Model Architecture	Backbone	Test Accuracy	Weighted F1-Score
Saliency-GFT (Our Hybrid)	CoAtNet + DINOv2	85.14%	0.8511
Saliency-GFT (Our GALA+PPS)	DINOv2-small	83.78%	0.8382
Standalone CoAtNet (Baseline)	CoAtNet	79.73%	0.7923
Reference GFT (Baseline)	ViT-B/16	78.38%	0.7821

⚙️ Setup & Installation

To set up the environment, please follow these steps:

Clone the repository:

git clone https://github.com/Sree14hari/DinoNet-Gradient-Focal_transformer.git
cd [YourRepoName]

Create a Python virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required packages:

pip install -r requirements.txt

A requirements.txt file should contain:

torch
torchvision
timm
scikit-learn
numpy
matplotlib
seaborn
tqdm
transformers

Dataset: Organize your melspectrograms_dataset directory with the following structure:

melspectrograms_dataset/
├── train/
│   ├── Dysarthia/
│   │   └── ...
│   └── Laryngitis/
│       └── ...
├── validation/
│   └── ...
└── test/
    └── ...

📝 License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
checkpoints_fusion		checkpoints_fusion
checkpoints_fusion_v2		checkpoints_fusion_v2
checkpoints_fusion_v3		checkpoints_fusion_v3
coatnet		coatnet
gft		gft
.gitignore		.gitignore
DinoNetGFT.ipynb		DinoNetGFT.ipynb
README.md		README.md
classification_report.png		classification_report.png
confusion_matrix.png		confusion_matrix.png
test.ipynb		test.ipynb
truegft.ipynb		truegft.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Saliency-GFT: A Novel Approach to Vocal Condition Classification

📖 Abstract

✨ Key Features

📊 Results

⚙️ Setup & Installation

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Saliency-GFT: A Novel Approach to Vocal Condition Classification

📖 Abstract

✨ Key Features

📊 Results

⚙️ Setup & Installation

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages