Skip to content

Conversation

Copy link

Copilot AI commented Nov 8, 2025

Implements standalone utilities for converting audio/video files to spectrogram images using the existing fourier_transformation and make_logscale functions from the Video Node, enabling batch dataset preparation for audio classification tasks.

Changes

Core Scripts

  • simple_video_to_spectrogram.py - Minimal batch processor following ESC-50 CSV pattern with fourier_transformation, make_logscale, and plot_spectrogram functions
  • video_to_spectrogram.py - Full CLI tool with single/batch modes, video support via ffmpeg audio extraction, configurable FFT bins and colormaps

Documentation & Examples

  • VIDEO_TO_SPECTROGRAM_README.md - Technical reference with parameters, output structure, performance tips
  • QUICKSTART_VIDEO_TO_SPECTROGRAM.md - Quick start guide with common use cases
  • examples/video_to_spectrogram_example.py - Working examples for single file, batch, and ESC-50 processing
  • Updated main README with feature section and usage examples

Testing

  • tests/test_video_to_spectrogram.py - Integration tests for STFT, log scaling, and end-to-end spectrogram generation
  • Updated requirements.txt with scipy and pandas

Usage

Batch processing with CSV metadata:

from simple_video_to_spectrogram import process_video_chunks_to_spectrograms

process_video_chunks_to_spectrograms(
    csv_path='metadata.csv',  # columns: filename, category
    audio_root='audio/',
    spectrogram_root='spectrograms/'
)

CLI for single files:

python video_to_spectrogram.py --mode single --input video.mp4 --output spec.jpg --binsize 2048

Spectrograms use identical STFT parameters as the Video Node (1024 bins, 50% overlap, Hanning window) with amplitude-to-decibel conversion and configurable colormaps.

Original prompt

utilise ça : def fourier_transformation(sig, frameSize, overlapFac=0.5, window=np.hanning):
win = window(frameSize)
hopSize = int(frameSize - np.floor(overlapFac * frameSize))

# zeros at beginning (thus center of 1st window should be for sample nr. 0)
samples = np.append(np.zeros(int(np.floor(frameSize/2.0))), sig)
# cols for windowing
cols = np.ceil( (len(samples) - frameSize) / float(hopSize)) + 1
# zeros at end (thus samples can be fully covered by frames)
samples = np.append(samples, np.zeros(frameSize))

frames = stride_tricks.as_strided(samples, shape=(int(cols), frameSize), strides=(samples.strides[0]*hopSize, samples.strides[0])).copy()
frames *= win

return np.fft.rfft(frames)   def make_logscale(spec, sr=44100, factor=20.):
timebins, freqbins = np.shape(spec)

scale = np.linspace(0, 1, freqbins) ** factor
scale *= (freqbins-1)/max(scale)
scale = np.unique(np.round(scale))

# create spectrogram with new freq bins
newspec = np.complex128(np.zeros([timebins, len(scale)]))
for i in range(0, len(scale)):
    if i == len(scale)-1:
        newspec[:,i] = np.sum(spec[:,int(scale[i]):], axis=1)
    else:
        newspec[:,i] = np.sum(spec[:,int(scale[i]):int(scale[i+1])], axis=1)

# list center freq of bins
allfreqs = np.abs(np.fft.fftfreq(freqbins*2, 1./sr)[:freqbins+1])
freqs = []
for i in range(0, len(scale)):
    if i == len(scale)-1:
        freqs += [np.mean(allfreqs[int(scale[i]):])]
    else:
        freqs += [np.mean(allfreqs[int(scale[i]):int(scale[i+1])])]

return newspec, freqs   import os

import pandas as pd
import scipy.io.wavfile as wav
import numpy as np
import matplotlib.pyplot as plt

Ta fonction plot_spectrogram

def plot_spectrogram(location, plotpath=None, binsize=2**10, colormap="jet"):
samplerate, samples = wav.read(location)
s = fourier_transformation(samples, binsize)
sshow, freq = make_logscale(s, factor=1.0, sr=samplerate)
ims = 20.*np.log10(np.abs(sshow)/10e-6) # amplitude to decibel

timebins, freqbins = np.shape(ims)
#print("timebins:", timebins, "freqbins:", freqbins)

plt.figure(figsize=(15, 7.5))
plt.imshow(np.transpose(ims), origin="lower", aspect="auto", cmap=colormap, interpolation="none")
xlocs = np.float32(np.linspace(0, timebins-1, 5))
plt.xticks(xlocs, ["%.02f" % l for l in ((xlocs*len(samples)/timebins)+(0.5*binsize))/samplerate])
ylocs = np.int16(np.round(np.linspace(0, freqbins-1, 10)))
plt.yticks(ylocs, ["%.02f" % freq[i] for i in ylocs])

if plotpath:
    plt.savefig(plotpath, bbox_inches="tight")
else:
    plt.show()
plt.clf()

return ims

Charger le CSV

esc50_df = pd.read_csv('/content/ESC-50-master/meta/esc50.csv')

Créer les dossiers

spectrogram_root = '/content/ESC-50-master/spectrogram'
os.makedirs(spectrogram_root, exist_ok=True)

for cat in esc50_df['category'].unique():
os.makedirs(os.path.join(spectrogram_root, cat), exist_ok=True)

Générer tous les spectrogrammes

for i, row in esc50_df.iterrows():
filename = row['filename']
category = row['category']
audio_path = os.path.join('/content/ESC-50-master/audio', filename)
save_path = os.path.join(spectrogram_root, category, filename.replace('.wav', '.jpg'))

try:
    plot_spectrogram(audio_path, plotpath=save_path)
except Exception as e:
    print(f"Erreur avec {filename}: {e}") pour la conversion des chunk de video en image

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Delete 41 development artifact markdown files
Copilot AI self-assigned this Nov 8, 2025
Copilot AI and others added 3 commits November 8, 2025 04:55
…tion and make_logscale

Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
…ram utilities

Co-authored-by: hackolite <826027+hackolite@users.noreply.github.com>
Copilot AI changed the title [WIP] Add Fourier transformation and log scale functions Add video-to-spectrogram batch conversion utilities using existing STFT functions Nov 8, 2025
Copilot AI requested a review from hackolite November 8, 2025 05:02
@hackolite hackolite force-pushed the main branch 2 times, most recently from d9f4029 to f5ce349 Compare November 9, 2025 10:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants