Add audio spectrogram processing utilities for YOLO audio classification #70
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements the audio-to-spectrogram-to-video workflow from the provided Colab notebook for ESC-50 and custom audio classification tasks.
Core Module
node/InputNode/audio_processing.py(436 lines)chunk_audio_wav_or_mp3()- Sliding window audio chunking (configurable duration/step)fourier_transformation()- STFT with windowing using stride tricksmake_logscale()- Logarithmic frequency binning for better low-freq resolutionplot_spectrogram()- Audio → spectrogram image with configurable colormapsprocess_chunks_to_spectrograms()- Batch processing for audio folderscreate_video_from_spectrograms()- Spectrogram sequence → MP4 with temporal alignmentcreate_video_with_audio_sync()- Optional ffmpeg audio track mergingannotate_image_with_classification()- Overlay top-N predictions with styled textTesting
tests/test_audio_processing.py- 9 tests covering all functions + full workflow integrationDocumentation
AUDIO_SPECTROGRAM_GUIDE.md- API reference, workflow examples (ESC-50, YOLO, custom)examples/simple_audio_spectrogram_example.py- Self-contained working demoUsage
Technical Details
Dependencies
Already in
requirements.txt: librosa, matplotlib, soundfileOriginal prompt
utilise ce code pour le développement du spectrogramme : # -- coding: utf-8 --
"""AudioTrain_LAMAAZ_1M.ipynb
Automatically generated by Colab.
Original file is located at
https://colab.research.google.com/drive/1AgWLLSACNAYYiBZyu414xvq2ri3q2IcW
TELECHARGEMENT DATA
"""
! wget https://github.com/karoldvl/ESC-50/archive/master.zip
"""https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-09-23--yolo8-listen/2023-09-23/"""
! unzip master.zip
"""## IMPORT"""
import numpy as np
from matplotlib import pyplot as plt
from numpy.lib import stride_tricks
import os
import pandas as pd
import scipy.io.wavfile as wav
esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')
esc50_df.head()
esc50_df['category'].value_counts()
def fourier_transformation(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))
def make_logscale(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)
import os
import pandas as pd
import scipy.io.wavfile as wav
import numpy as np
import matplotlib.pyplot as plt
Ta fonction plot_spectrogram
def plot_spectrogram(location, plotpath=None, binsize=2**10, colormap="jet"):
samplerate, samples = wav.read(location)
s = fourier_transformation(samples, binsize)
sshow, freq = make_logscale(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel
Charger le CSV
esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')
Créer les dossiers
spectrogram_root = '/content/ESC-50-master/spectrogram'
os.makedirs(spectrogram_root, exist_ok=True)
for cat in esc50_df['category'].unique():
os.makedirs(os.path.join(spectrogram_root, cat), exist_ok=True)
Générer tous les spectrogrammes
for i, row in esc50_df.iterrows():
filename = row['filename']
category = row['category']
audio_path = os.path.join('/content/ESC-50-master/audio', filename)
save_path = os.path.join(spectrogram_root, category, filename.replace('.wav', '.jpg'))
plot = plot_spectrogram('/content/ESC-50-master/audio/' + esc50_df[esc50_df['category'] == 'crow']['filename'].iloc[0])
conversion = []
for i in range(len(esc50_df.index)):
filename = esc50_df['filename'].iloc[i]
location = '/content/ESC-50-master/audio' + filename
category = esc50_df['category'].iloc[i]
catpath = '/content/ESC-50-master/spectrogram/' + category
filepath = catpath + '/' + filename[:-4] + '.jpg'
conversion[0]
!pip install split-folders
import splitfolders
input_folder = '/content/ESC-50-master/spectrogram'
output = 'data'
splitfolders.ratio(input_folder, output=output, seed=42, ratio=(.8, .2))
testing = [
'data/test/helicopter.wav',
'data/test/cat.wav'
]
! pip...
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.