A PyTorch reproduction of the Temporal FiLM (Birnbaum, S., et al., 2019 [NeurIPS]) super-resolution method.
Clone the project from the desired parent directory on the local device:
git clone https://github.com/krokode/Audio_SR.git
cd Audio_SRSetup the software environment:
. setup.sh
. activate.shsetup.ps1
activate.ps1Clone the project onto the data partition:
cd $VSC_DATA
git clone https://github.com/krokode/Audio_SR.git
cd Audio_SRSetup the software environment:
. vsc_setup_wice.sh
. vsc_activate_wise.shcd data/vctk
python arc_load_unpack.pyUpload data with WinSCP to the $VSC_DATA/Audio_SR/data/vctk partition.
Unpack the data acrhive:
cd $VSC_DATA/Audio_SR/data/vctk
tar -xvf VCTK-Corpus.tar.gzPrepare the raw data for the planned experiment:
python prepare_dataset.py --sampling_rate 16000 --scale 4 --window_size 8192 --window_stride 4096 --batch_size 128 --interpolate --low_pass --out_dir 'datasets'Train model on h5 files for 150 epochs (update NUM_EPOCHS inside run.py)
cd ../../src
python run.pyIt will create 4 times lower resolution example then pass it though model and create predicted wav file.
Pass any WAV file in high-resolution, for example p270_002.wav:
python visualize.py --model best_model_V1_6.pth --wav p270_002.wav --out outputAudio results and spectrograms will be available in /visualizations/output/.