[wip] Decode WAV with Python backend#1222
[wip] Decode WAV with Python backend#1222Dan-Flores wants to merge 8 commits intometa-pytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1222
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 28 New FailuresAs of commit 16472d5 with merge base 377c638 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| # Try fast WAV path | ||
| self._wav_decoder = WavDecoder.validate_and_init( | ||
| source, sample_rate, num_channels, stream_index | ||
| ) |
There was a problem hiding this comment.
self._wav_decoder is only populated if WAV decoding was successful.
There was a problem hiding this comment.
Do we have a sense of how expensive this check is? It'll be run for every single input file so it's important that it's quick.
| ) | ||
| if self._wav_decoder is not None: | ||
| self.stream_index = self._wav_decoder.stream_index | ||
| self.metadata = self._wav_decoder.metadata |
There was a problem hiding this comment.
Because WavDecoder is a python class, we can set the AudioStreamMetadata class easily.
src/torchcodec/decoders/_fast_wav.py
Outdated
| elif isinstance(source, (str, Path)): | ||
| path = Path(source) | ||
| if path.suffix.lower() == ".wav": | ||
| try: | ||
| with open(path, "rb") as f: | ||
| source_bytes = f.read() | ||
| except OSError: | ||
| return None | ||
| elif isinstance(source, (io.RawIOBase, io.BufferedReader)) or ( | ||
| hasattr(source, "read") and hasattr(source, "seek") | ||
| ): | ||
| source_bytes = source.read() | ||
| # Will reset seek position below if we can't use fast path |
There was a problem hiding this comment.
So this works but it also reads / loads / downloads the entire content of a file-like object in memory. We should try to see how FFmpeg behaves when decoding a very long wav file for example. Does it need to load the entire file at once? If not, we might be losing that functionality here, which is something to consider.
Same for the C++ alternative, which I haven't checked yet.
There was a problem hiding this comment.
It does not need to, I've updated both implementations to only read the header at first. Thanks for the suggestion!
| stream_index: int | None = None, | ||
| ): | ||
| """ | ||
| Create a WavDecoder for the given source. |
There was a problem hiding this comment.
This function handles all input source types, and calls _parse_wav_chunks with a function that can read the detected input type.
- If successful, this function initializes a
WavDecoderobject. - If it fails, it will raise an error. This init is designed to be called by
try_createbelow, which catches the error so we can fallback to FFmpeg backend.
| bytes_per_sample = metadata.bits_per_sample // 8 | ||
| num_samples = len(audio_bytes) // bytes_per_sample // metadata.num_channels | ||
|
|
||
| # Convert to tensor based on format |
There was a problem hiding this comment.
Here various WAV formats are normalized to [-1, 1].
Some formats use the full range of a dtype, so torch.iinfo is used to get the maximum value of that type.
| audio_format = 0 | ||
| num_channels = 0 | ||
| sample_rate = 0 | ||
| bits_per_sample = 0 |
There was a problem hiding this comment.
Once we find the RIFF and WAVE signatures, we can look for the fmt chunk which contains metadata, and the data chunk which contains the actual samples. We find the data chunk now to store bytes_per_sample and num_samples, which will be needed later for decoding.
To review:
_audio_decoder.py.class WavDecoder, and the associatedtry_create._parse_wav_chunks, it looks complicated but just handles the WAV file header._samples_from_bytes. This conversion is considerably easier to read than the C++ implementation.