Skip to content

Conversation

Copy link

Copilot AI commented Nov 8, 2025

Implements the audio-to-spectrogram-to-video workflow from the provided Colab notebook for ESC-50 and custom audio classification tasks.

Core Module

node/InputNode/audio_processing.py (436 lines)

  • chunk_audio_wav_or_mp3() - Sliding window audio chunking (configurable duration/step)
  • fourier_transformation() - STFT with windowing using stride tricks
  • make_logscale() - Logarithmic frequency binning for better low-freq resolution
  • plot_spectrogram() - Audio → spectrogram image with configurable colormaps
  • process_chunks_to_spectrograms() - Batch processing for audio folders
  • create_video_from_spectrograms() - Spectrogram sequence → MP4 with temporal alignment
  • create_video_with_audio_sync() - Optional ffmpeg audio track merging
  • annotate_image_with_classification() - Overlay top-N predictions with styled text

Testing

tests/test_audio_processing.py - 9 tests covering all functions + full workflow integration

Documentation

  • AUDIO_SPECTROGRAM_GUIDE.md - API reference, workflow examples (ESC-50, YOLO, custom)
  • examples/simple_audio_spectrogram_example.py - Self-contained working demo

Usage

from node.InputNode.audio_processing import *

# ESC-50 workflow
chunk_audio_wav_or_mp3("audio.wav", "chunks/", chunk_duration=5.0, step_duration=0.25)
process_chunks_to_spectrograms("chunks/", "spectrograms/")
create_video_from_spectrograms("spectrograms/", "output.mp4", fps=4)

# With YOLO classification annotations
predictions = [("Dog", 0.95), ("Cat", 0.03), ("Bird", 0.01)]
annotate_image_with_classification("spec.png", "annotated.png", predictions)

Technical Details

  • STFT: Hanning window, 50% overlap, stride-based frame extraction
  • Frequency scaling: Logarithmic binning (factor=20) for perceptual uniformity
  • Video timing: 0.25s per chunk display matches audio step duration
  • Colormaps: jet, inferno, viridis, magma, plasma (matplotlib)
  • Audio sync: ffmpeg subprocess with AAC encoding

Dependencies

Already in requirements.txt: librosa, matplotlib, soundfile

Original prompt

utilise ce code pour le développement du spectrogramme : # -- coding: utf-8 --
"""AudioTrain_LAMAAZ_1M.ipynb

Automatically generated by Colab.

Original file is located at
https://colab.research.google.com/drive/1AgWLLSACNAYYiBZyu414xvq2ri3q2IcW

TELECHARGEMENT DATA

"""

! wget https://github.com/karoldvl/ESC-50/archive/master.zip

"""https://mpolinowski.github.io/docs/IoT-and-Machine-Learning/ML/2023-09-23--yolo8-listen/2023-09-23/"""

! unzip master.zip

"""## IMPORT"""

import numpy as np
from matplotlib import pyplot as plt
from numpy.lib import stride_tricks
import os
import pandas as pd
import scipy.io.wavfile as wav

esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')
esc50_df.head()

esc50_df['category'].value_counts()

def fourier_transformation(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))

# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))

frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win

return np.fft.rfft(frames)

def make_logscale(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)

scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))

# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
    if i == len(scale)-1:
        newspec[:,i] = np.sum(spec[:,int(scale[i]):], axis=1)
    else:
        newspec[:,i] = np.sum(spec[:,int(scale[i]):int(scale[i+1])], axis=1)

# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
    if i == len(scale)-1:
        freqs += [np.mean(allfreqs[int(scale[i]):])]
    else:
        freqs += [np.mean(allfreqs[int(scale[i]):int(scale[i+1])])]

return newspec, freqs

import os
import pandas as pd
import scipy.io.wavfile as wav
import numpy as np
import matplotlib.pyplot as plt

Ta fonction plot_spectrogram

def plot_spectrogram(location, plotpath=None, binsize=2**10, colormap="jet"):
samplerate, samples = wav.read(location)
s = fourier_transformation(samples, binsize)
sshow, freq = make_logscale(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel

timebins, freqbins = np.shape(ims)
#print("timebins:", timebins, "freqbins:", freqbins)

plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])

if plotpath:
    plt.savefig(plotpath, bbox_inches="tight")
else:
    plt.show()
plt.clf()

return ims

Charger le CSV

esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')

Créer les dossiers

spectrogram_root = '/content/ESC-50-master/spectrogram'
os.makedirs(spectrogram_root, exist_ok=True)

for cat in esc50_df['category'].unique():
os.makedirs(os.path.join(spectrogram_root, cat), exist_ok=True)

Générer tous les spectrogrammes

for i, row in esc50_df.iterrows():
filename = row['filename']
category = row['category']
audio_path = os.path.join('/content/ESC-50-master/audio', filename)
save_path = os.path.join(spectrogram_root, category, filename.replace('.wav', '.jpg'))

try:
    plot_spectrogram(audio_path, plotpath=save_path)
except Exception as e:
    print(f"Erreur avec {filename}: {e}")

plot = plot_spectrogram('/content/ESC-50-master/audio/' + esc50_df[esc50_df['category'] == 'crow']['filename'].iloc[0])

conversion = []

for i in range(len(esc50_df.index)):
filename = esc50_df['filename'].iloc[i]
location = '/content/ESC-50-master/audio' + filename
category = esc50_df['category'].iloc[i]
catpath = '/content/ESC-50-master/spectrogram/' + category
filepath = catpath + '/' + filename[:-4] + '.jpg'

conversion.append({location, filepath})

conversion[0]

!pip install split-folders
import splitfolders

input_folder = '/content/ESC-50-master/spectrogram'
output = 'data'
splitfolders.ratio(input_folder, output=output, seed=42, ratio=(.8, .2))

testing = [
'data/test/helicopter.wav',
'data/test/cat.wav'
]

! pip...


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Delete 41 development artifact markdown files
Copilot AI and others added 3 commits November 8, 2025 09:23
- Add audio_processing.py module with complete workflow functions
- Implement audio chunking with sliding windows
- Add spectrogram generation and batch processing
- Implement video creation from spectrogram sequences
- Add image annotation with classification results
- Include audio-video synchronization support
- Add comprehensive test suite (9 tests, all passing)
- Create demo script with usage examples
- Add detailed documentation guide

Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
- Update README.md with audio processing requirements and documentation links
- Add simple_audio_spectrogram_example.py demonstrating the workflow
- Create examples/ directory for code samples
- Example creates 3-second audio with A-C-E notes and processes it
- All functionality tested and working

Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
- Create comprehensive implementation summary document
- Document all features, technical details, and testing results
- Include usage examples and performance characteristics
- Document known limitations and future integration plans
- Complete project documentation

Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
Copilot AI changed the title [WIP] Utilise code for developing spectrogram feature Add audio spectrogram processing utilities for YOLO audio classification Nov 8, 2025
Copilot AI requested a review from hackolite November 8, 2025 09:31
@hackolite hackolite force-pushed the main branch 2 times, most recently from d9f4029 to f5ce349 Compare November 9, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants