Skip to content

[wip] Decode WAV with C++ backend#1221

Draft
Dan-Flores wants to merge 7 commits intometa-pytorch:mainfrom
Dan-Flores:wav-cpp
Draft

[wip] Decode WAV with C++ backend#1221
Dan-Flores wants to merge 7 commits intometa-pytorch:mainfrom
Dan-Flores:wav-cpp

Conversation

@Dan-Flores
Copy link
Contributor

@Dan-Flores Dan-Flores commented Feb 4, 2026

To review:

  • First, see the changes to the public AudioDecoder class.
    • This implementation changes the _audio_decoder.py Python file. To completely contain changes in C++ requires more complex changes, ex. implementing a Decoder class that SingleStreamDecoder and WavDecoder implement.
  • Read _is_uncompressed_wav, and read a function it dispatches to, like get_wav_metadata_from_file.
    • Skim through the input type handler classes, WavFileReader and WavTensorReader
  • Read WavDecoder::WavDecoder init implementation, follow it to parseHeader.
  • Wonder at convertSamplesToFloat
    • Using tensor operations here reduces the performance gains. This implementation does the conversion in a single pass.

@pytorch-bot
Copy link

pytorch-bot bot commented Feb 4, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1221

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 1 New Failure

As of commit c01d370 with merge base 377c638 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 4, 2026
self._desired_sample_rate = metadata["sampleRate"]
self._decoder = None # type: ignore[assignment]
self.metadata = AudioStreamMetadata.from_json(metadata)
return
Copy link
Contributor Author

@Dan-Flores Dan-Flores Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AudioDecoder exposes the input audio's metadata to the user. Using a C++ backend without FFmpeg, we pass over the information necessary to create AudioStreamMetadata from the C++ side. This implementation uses JSON, but its possible to pass each field individually and construct an AudioStreamMetadata object here as well.

pts_seconds=0.0,
duration_seconds=metadata["durationSeconds"],
sample_rate=metadata["sampleRate"],
)
Copy link
Contributor Author

@Dan-Flores Dan-Flores Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self._wav_source is only populated if WAV decoding was successful.

if (!checkFourCC(data + 8, "WAVE")) {
throw std::runtime_error("Missing WAVE format identifier");
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we find the RIFF and WAVE signatures, we can look for the fmt chunk which contains metadata, and the data chunk which contains the actual samples. We find the data chunk now to store dataSize and dataOffset, which will be needed later for decoding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant