Add audio spectrogram processing utilities for YOLO audio classification #70

Copilot · 2025-11-08T09:14:13Z

Implements the audio-to-spectrogram-to-video workflow from the provided Colab notebook for ESC-50 and custom audio classification tasks.

Core Module

node/InputNode/audio_processing.py (436 lines)

chunk_audio_wav_or_mp3() - Sliding window audio chunking (configurable duration/step)
fourier_transformation() - STFT with windowing using stride tricks
make_logscale() - Logarithmic frequency binning for better low-freq resolution
plot_spectrogram() - Audio → spectrogram image with configurable colormaps
process_chunks_to_spectrograms() - Batch processing for audio folders
create_video_from_spectrograms() - Spectrogram sequence → MP4 with temporal alignment
create_video_with_audio_sync() - Optional ffmpeg audio track merging
annotate_image_with_classification() - Overlay top-N predictions with styled text

Testing

tests/test_audio_processing.py - 9 tests covering all functions + full workflow integration

Documentation

AUDIO_SPECTROGRAM_GUIDE.md - API reference, workflow examples (ESC-50, YOLO, custom)
examples/simple_audio_spectrogram_example.py - Self-contained working demo

Usage

from node.InputNode.audio_processing import *

# ESC-50 workflow
chunk_audio_wav_or_mp3("audio.wav", "chunks/", chunk_duration=5.0, step_duration=0.25)
process_chunks_to_spectrograms("chunks/", "spectrograms/")
create_video_from_spectrograms("spectrograms/", "output.mp4", fps=4)

# With YOLO classification annotations
predictions = [("Dog", 0.95), ("Cat", 0.03), ("Bird", 0.01)]
annotate_image_with_classification("spec.png", "annotated.png", predictions)

Technical Details

STFT: Hanning window, 50% overlap, stride-based frame extraction
Frequency scaling: Logarithmic binning (factor=20) for perceptual uniformity
Video timing: 0.25s per chunk display matches audio step duration
Colormaps: jet, inferno, viridis, magma, plasma (matplotlib)
Audio sync: ffmpeg subprocess with AAC encoding

Dependencies

Already in requirements.txt: librosa, matplotlib, soundfile

Original prompt

utilise ce code pour le développement du spectrogramme : # -- coding: utf-8 --
"""AudioTrain_LAMAAZ_1M.ipynb

Automatically generated by Colab.

Original file is located at
https://colab.research.google.com/drive/1AgWLLSACNAYYiBZyu414xvq2ri3q2IcW

TELECHARGEMENT DATA

"""

! wget https://github.com/karoldvl/ESC-50/archive/master.zip

"""https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-09-23--yolo8-listen/2023-09-23/"""

! unzip master.zip

"""## IMPORT"""

import numpy as np
from matplotlib import pyplot as plt
from numpy.lib import stride_tricks
import os
import pandas as pd
import scipy.io.wavfile as wav

esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')
esc50_df.head()

esc50_df['category'].value_counts()

def fourier_transformation(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))

# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))

frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win

return np.fft.rfft(frames)

def make_logscale(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)

scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))

# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
    if i == len(scale)-1:
        newspec[:,i] = np.sum(spec[:,int(scale[i]):], axis=1)
    else:
        newspec[:,i] = np.sum(spec[:,int(scale[i]):int(scale[i+1])], axis=1)

# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
    if i == len(scale)-1:
        freqs += [np.mean(allfreqs[int(scale[i]):])]
    else:
        freqs += [np.mean(allfreqs[int(scale[i]):int(scale[i+1])])]

return newspec, freqs

import os
import pandas as pd
import scipy.io.wavfile as wav
import numpy as np
import matplotlib.pyplot as plt

Ta fonction plot_spectrogram

def plot_spectrogram(location, plotpath=None, binsize=2**10, colormap="jet"):
samplerate, samples = wav.read(location)
s = fourier_transformation(samples, binsize)
sshow, freq = make_logscale(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel

timebins, freqbins = np.shape(ims)
#print("timebins:", timebins, "freqbins:", freqbins)

plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])

if plotpath:
    plt.savefig(plotpath, bbox_inches="tight")
else:
    plt.show()
plt.clf()

return ims

Charger le CSV

esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')

Créer les dossiers

spectrogram_root = '/content/ESC-50-master/spectrogram'
os.makedirs(spectrogram_root, exist_ok=True)

for cat in esc50_df['category'].unique():
os.makedirs(os.path.join(spectrogram_root, cat), exist_ok=True)

Générer tous les spectrogrammes

for i, row in esc50_df.iterrows():
filename = row['filename']
category = row['category']
audio_path = os.path.join('/content/ESC-50-master/audio', filename)
save_path = os.path.join(spectrogram_root, category, filename.replace('.wav', '.jpg'))

try:
    plot_spectrogram(audio_path, plotpath=save_path)
except Exception as e:
    print(f"Erreur avec {filename}: {e}")

plot = plot_spectrogram('/content/ESC-50-master/audio/' + esc50_df[esc50_df['category'] == 'crow']['filename'].iloc[0])

conversion = []

for i in range(len(esc50_df.index)):
filename = esc50_df['filename'].iloc[i]
location = '/content/ESC-50-master/audio' + filename
category = esc50_df['category'].iloc[i]
catpath = '/content/ESC-50-master/spectrogram/' + category
filepath = catpath + '/' + filename[:-4] + '.jpg'

conversion.append({location, filepath})

conversion[0]

!pip install split-folders
import splitfolders

input_folder = '/content/ESC-50-master/spectrogram'
output = 'data'
splitfolders.ratio(input_folder, output=output, seed=42, ratio=(.8, .2))

testing = [
'data/test/helicopter.wav',
'data/test/cat.wav'
]

! pip...

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Delete 41 development artifact markdown files

- Add audio_processing.py module with complete workflow functions - Implement audio chunking with sliding windows - Add spectrogram generation and batch processing - Implement video creation from spectrogram sequences - Add image annotation with classification results - Include audio-video synchronization support - Add comprehensive test suite (9 tests, all passing) - Create demo script with usage examples - Add detailed documentation guide Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>

- Update README.md with audio processing requirements and documentation links - Add simple_audio_spectrogram_example.py demonstrating the workflow - Create examples/ directory for code samples - Example creates 3-second audio with A-C-E notes and processes it - All functionality tested and working Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>

- Create comprehensive implementation summary document - Document all features, technical details, and testing results - Include usage examples and performance characteristics - Document known limitations and future integration plans - Complete project documentation Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>

Merge pull request #67 from hackolite/copilot/remove-useless-md-files

1677cdb

Delete 41 development artifact markdown files

Copilot AI assigned Copilot and hackolite Nov 8, 2025

Initial plan

4ca8404

Copilot started work on behalf of hackolite November 8, 2025 09:14 View session

Copilot AI and others added 3 commits November 8, 2025 09:23

Copilot AI changed the title ~~[WIP] Utilise code for developing spectrogram feature~~ Add audio spectrogram processing utilities for YOLO audio classification Nov 8, 2025

Copilot AI requested a review from hackolite November 8, 2025 09:31

Copilot finished work on behalf of hackolite November 8, 2025 09:31

hackolite force-pushed the main branch 2 times, most recently from d9f4029 to f5ce349 Compare November 9, 2025 10:29

hackolite force-pushed the main branch from 4c945df to f581acf Compare December 21, 2025 20:25

hackolite force-pushed the main branch from 5c355e1 to 3215af1 Compare January 2, 2026 02:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add audio spectrogram processing utilities for YOLO audio classification #70

Add audio spectrogram processing utilities for YOLO audio classification #70

Uh oh!

Copilot AI commented Nov 8, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add audio spectrogram processing utilities for YOLO audio classification #70

Are you sure you want to change the base?

Add audio spectrogram processing utilities for YOLO audio classification #70

Uh oh!

Conversation

Copilot AI commented Nov 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core Module

Testing

Documentation

Usage

Technical Details

Dependencies

TELECHARGEMENT DATA

Ta fonction plot_spectrogram

Charger le CSV

Créer les dossiers

Générer tous les spectrogrammes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Nov 8, 2025 •

edited

Loading