1. Introduction

🎉 Welcome to EzhouNet A framework based on graph neural network and anchor interval for the respiratory sound event detection .

This repository provides an end-to-end deep learning method for sound event detection (SED).
We focus on respiratory sound events, and the idea was inspired by anchor boxes in computer vision.

Instead of using frame-level post-processing, we directly learn event intervals by:

Generating anchor intervals with
desed_task/dataio/datasets_resp_v9_8_7.py → RespiraGnnSet(Dataset).generate_anchor_intervals
Refining interval offsets with
desed_task/nnet/EzhouNet_v9_7_9.py → GraphRespiratory(nn.Module).Interval_Refine

⚠ Please note: while this method has been shown effective for sound event detection, in the respiratory sound detection scenario, it is not yet ready for clinical use.
And this repo serves as a reference implementation for researchers. The original design principles are detailed in our paper, though many modules have since been updated.

2. Getting Started

Install the evaluation functions following the steps in DESED_task.
These will be used to compute sound event detection metrics.
Set up your environment:
- python=3.8
- pytorch=1.13.1
- pytorch-lightning=2.2.5
- torch_geometric=2.5.2
- Install dependencies:
```
pip install -r requirements.txt
```
Prepare your dataset.
- For respiratory sounds, we used SPRsound and HF Lung V1.

3. Training

cd into the this path :

/Respira_SED_LGNN/recipes/dcase2023_task4_baseline/

3.1 Learn start & end offsets of anchor intervals

Set requires_grad=True or False to control whether bins are learnable:

self.start_weight_params = nn.ParameterList([
    nn.Parameter(torch.linspace(-1.50, 1.50, dist_bins_list[i]), requires_grad=False)
    for i in range(self.num_scales)
])
self.end_weight_params = nn.ParameterList([
    nn.Parameter(torch.linspace(-1.50, 1.50, dist_bins_list[i]), requires_grad=False)
    for i in range(self.num_scales)
])

python   train_respiratory_lab9_8_6.py

3.2 YOLO-style learning of center & width offsets

a_w = (ends - starts).clamp(min=1e-6)  # anchor width, seconds
a_c = 0.5 * (starts + ends)            # anchor center

pred_centers = a_c + t_c_pred * a_w
pred_widths = a_w * torch.exp(t_w_pred.clamp(min=-6.0, max=6.0))

s = (pred_centers - 0.5 * pred_widths).clamp(min=0.0, max=float(audio_len))
e = (pred_centers + 0.5 * pred_widths).clamp(min=0.0, max=float(audio_len))

 python   train_respiratory_lab10_1_2.py

3.3 Combine both methods

Mixing center-offset and start/end-offset learning improves detection performance.

       python   train_respiratory_lab10_1_3.py

here is a reference result:

Using confidence threshold: conf=0.501
Category-specific NMS IoU thresholds:
  Stridor: 0.5
  Wheeze: 0.4
  Crackle: 0.15
  Rhonchi: 0.3
	 call the compute event based metrics  

the Event based overall   f score: 0.19796610169491524, 	 error rate : 2.3084479371316307

the Event based class wise average f score: 0.1848416711564406,	 error rate : 2.966633604392851

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    |
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ |
    Rhonchi      | 29      94    | 14.6%  9.6%   31.0%   | 3.62   0.69   2.93    |
    Stridor      | 5       18    | 17.4%  11.1%  40.0%   | 3.80   0.60   3.20    |
    Crackle      | 287     496   | 17.4%  13.7%  23.7%   | 2.25   0.76   1.49    |
    Wheeze       | 188     358   | 24.5%  18.7%  35.6%   | 2.19   0.64   1.55    |

Using confidence threshold: conf=0.65
Category-specific NMS IoU thresholds:
  Stridor: 0.5
  Wheeze: 0.4
  Crackle: 0.15
  Rhonchi: 0.3
	 call the compute event based metrics  

the Event based overall   f score: 0.20275862068965514, 	 error rate : 2.257367387033399

the Event based class wise average f score: 0.19081504850632036,	 error rate : 2.8438182069170024

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    |
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ |
    Rhonchi      | 29      88    | 15.4%  10.2%  31.0%   | 3.41   0.69   2.72    |
    Stridor      | 5       17    | 18.2%  11.8%  40.0%   | 3.60   0.60   3.00    |
    Crackle      | 287     486   | 17.9%  14.2%  24.0%   | 2.21   0.76   1.45    |
    Wheeze       | 188     350   | 24.9%  19.1%  35.6%   | 2.15   0.64   1.51    |

Using confidence threshold: conf=0.8
Category-specific NMS IoU thresholds:
  Stridor: 0.5
  Wheeze: 0.4
  Crackle: 0.15
  Rhonchi: 0.3
	 call the compute event based metrics  

the Event based overall   f score: 0.20889202540578686, 	 error rate : 2.1886051080550097

the Event based class wise average f score: 0.20316344967739192,	 error rate : 2.6116479647528896

  Class-wise metrics
  ======================================
    Event label  | Nref    Nsys  | F        Pre      Rec    | ER       Del      Ins    |
    ------------ | -----   ----- | ------   ------   ------ | ------   ------   ------ |
    Rhonchi      | 29      82    | 16.2%  11.0%  31.0%   | 3.21   0.69   2.52    |
    Stridor      | 5       14    | 21.1%  14.3%  40.0%   | 3.00   0.60   2.40    |
    Crackle      | 287     479   | 18.3%  14.6%  24.4%   | 2.18   0.76   1.43    |
    Wheeze       | 188     333   | 25.7%  20.1%  35.6%   | 2.06   0.64   1.41    |

4. Further Steps

If you’d like to improve upon this work, here are some suggestions:

Avoid group cyclic slicing of spectrogram feature maps. While useful for grouped feature extraction, it makes quantization & deployment difficult.
Try alternative respiratory features: spectrograms, MFCCs, energy, or statistical features (see the paper Benchmarking of eight RNN variants for breath phase and adventitious sound detection on hf_lung_v1).
Explore advanced multi scale graph convolution modules for node updates , e.g.:

5.Inspiration

The idea came a biomedical conference in the University, where I saw graph neural networks being widely applied to biosignals. That’s how EzhouNet was born — named after the city of Ezhou.

During the research time in Ezhou, I met a friend there, Kun, who took me to visit Liangzi Lake. He said:

“People knows Wuhan’s East Lake, but few know Liangzi Lake in Ezhou.” It truly is an ecological gem. 🌿🌊

Feel free to fork, experiment, and improve your lab. If you like it, give it star.

Happy coding, and good luck with your projects! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Auiltxary_Function		Auiltxary_Function
HF_v1_data		HF_v1_data
PSDS_Eval		PSDS_Eval
data		data
desed_task		desed_task
recipes		recipes
sed_scores_eval		sed_scores_eval
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py
setup_v1.py		setup_v1.py
temp.py		temp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

1. Introduction

2. Getting Started

3. Training

3.1 Learn start & end offsets of anchor intervals

3.2 YOLO-style learning of center & width offsets

3.3 Combine both methods

4. Further Steps

5.Inspiration

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

1. Introduction

2. Getting Started

3. Training

3.1 Learn start & end offsets of anchor intervals

3.2 YOLO-style learning of center & width offsets

3.3 Combine both methods

4. Further Steps

5.Inspiration

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages