Audacity is an open source audio editor that can be used to visualize audio files generated by VERSE. VERSE datasets contain multiple folders each one dedicated to a single audio scene, a single use case. Three files are included in each folder: the audio rendering in Matroska (.MKV) format, the .YAML descriptor of the audio file and the scene definition (.YAML).
To start from a real example make sure you have rendered the "simple_example" dataset by running:
./render_dataset.py -i ../resources/ds_recipes/simple_example/info/simple_example.yaml
Results will be available under
[VERSE]/datasets/simple_example/train
Looking inside one of the rendere scenes we have:
cd 001301_dynamic_multivoice_0_1_1/
tree
.
├── 001301_dynamic_multivoice_0_1_1.yaml
├── dynamic_multivoice.mkv
└── dynamic_multivoice_mkv.yaml
The dinamic_multivoice.mkv is the container of all the audio artifacts for this audio scene.
VERSE's audio files have a companion descriptor leveraging YAML syntax to describe the content of the file itself. In this case we have "dynamic_multivoice_mkv.yaml" which shows the following:
syntax:
name: verse_audio_mkv
description: none
name: verse rendered audio scene
file: [VERSE]/datasets/simple_example/train/001301_dynamic_multivoice_0_1_1/dynamic_multivoice.mkv
sources_count: 3
sources:
0:
channels: 1
file: 000056_gentlemenpreferblondes.wav
track_id: 0
1:
channels: 1
file: 000027_blackbuccaneer.wav
track_id: 1
2:
channels: 1
file: 000071_gianburrasca.wav
track_id: 2
receivers_count: 8
receivers:
0:
channels: 2
file: dynamic_multivoice_binaural_000.wav
track_id: 3
1:
channels: 2
file: dynamic_multivoice_array_six_front_001.wav
track_id: 4
2:
channels: 2
file: dynamic_multivoice_array_six_middle_002.wav
track_id: 5
3:
channels: 2
file: dynamic_multivoice_array_six_rear_003.wav
track_id: 6
where [VERSE] is your local copy for VERSE repo.
The descriptor shows that the rendered audio does contain n.3 "sources", meaning human voices. These are the original (mono) voices that were used to rendere the audio scene. There are also 8 receivers (one listener with eight receivers, meaning 4 pairs of microphones).
The details of how the receivers are placed and how the sources move in space are described by the scene definition file, in this case the file: "001301_dynamic_multivoice_0_1_1.yaml".
For each receiver the desciptor indicates the number of channels (2) and the track number which will be useful for Audacity visualization.
Note that this track number is the same you get by using the command line tools "play_scene.py" with the "-l" option (listing tracks)
The syntax for "scene" is explained in scene_syntax_howto.
For this scene the listener "head" has one pair of receivers placed in the ears (binaural) and a six-mic-array placed around the head, hence we have a front/middle/rear indication for the head mic array.
Open Audacity and load the .mkv file, you will be presented with a list of audio tracks to be imported, select all of them (use SHIFT and MOUSE-CLICK):
Next you will see the full list of audio tracks following the same track numbering order as for the .yaml descriptor.
The first three tracks are the audio sources used in this scene (they are mono audio files, different length). The last tracks are stereo (mic pairs) referring to binaura or array-six[front|middle|rear].
You can use MUTE/SOLO buttons and all the features of Audacity to compare, filter and play the audio tracks.

