[wip] Decode WAV with C++ backend#1221
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1221
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit c01d370 with merge base 377c638 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| self._desired_sample_rate = metadata["sampleRate"] | ||
| self._decoder = None # type: ignore[assignment] | ||
| self.metadata = AudioStreamMetadata.from_json(metadata) | ||
| return |
There was a problem hiding this comment.
The AudioDecoder exposes the input audio's metadata to the user. Using a C++ backend without FFmpeg, we pass over the information necessary to create AudioStreamMetadata from the C++ side. This implementation uses JSON, but its possible to pass each field individually and construct an AudioStreamMetadata object here as well.
| pts_seconds=0.0, | ||
| duration_seconds=metadata["durationSeconds"], | ||
| sample_rate=metadata["sampleRate"], | ||
| ) |
There was a problem hiding this comment.
self._wav_source is only populated if WAV decoding was successful.
| if (!checkFourCC(data + 8, "WAVE")) { | ||
| throw std::runtime_error("Missing WAVE format identifier"); | ||
| } | ||
|
|
There was a problem hiding this comment.
Once we find the RIFF and WAVE signatures, we can look for the fmt chunk which contains metadata, and the data chunk which contains the actual samples. We find the data chunk now to store dataSize and dataOffset, which will be needed later for decoding.
To review:
AudioDecoderclass._audio_decoder.pyPython file. To completely contain changes in C++ requires more complex changes, ex. implementing aDecoderclass thatSingleStreamDecoderandWavDecoderimplement._is_uncompressed_wav, and read a function it dispatches to, likeget_wav_metadata_from_file.WavFileReaderandWavTensorReaderWavDecoder::WavDecoderinit implementation, follow it toparseHeader.convertSamplesToFloat