Apex and torchaudio micro benchmarking by amd-sriram · Pull Request #21 · ROCm/pytorch-micro-benchmarking

amd-sriram · 2025-10-13T14:51:52Z

Motivation

Create microbenchmarking code for apex and torchaudio.
It should load example models and calculate time per batch and throughput for these models.

Technical Details

The below diagram explains the main steps in the microbenchmark code:

The microbenchark

loads different models with no pretrained weights
calculates time per batch - the average time for forward and/or backward passes for batchsize input done iteration times and throughput (batches processed per second).

warm up forward and backward pass
start time = ...
for index in range(iterations):
     # perform forward and/or backward pass
end time = ...
time per batch = (end time - start time) / iterations
throughput = batchsize / time per batch

warmup (two forward and/or backward call is made) before profiling using deepspeed.profiling.flops_profiler.FlopsProfiler

Apex microbenchmarking

Based on this example https://github.com/ROCm/apex/tree/master/examples/imagenet, changing the micro_benchmarking_pytorch to apex:
https://github.com/ROCm/pytorch-micro-benchmarking/blob/apex_micro_benchmarking/micro_benchmarking_apex.py

Pytorch code	Apex code
network = fp16util .network_to_half(network)	network = apex.fp16_utils .FP16Model(network)
torch.nn.parallel.DistributedDataParallel	apex.parallel.DistributedDataParallel
pytorch group norm	model = apex.parallel.convert_syncbn_model(model)
amp.initialize()	apex.amp.initialize(..., keep_batchnorm_fp32, loss_scale)
torch.optim.SGD(...)	apex.optimizers.FusedSGD(...)

Add the following arguments:

    parser.add_argument('--sync_bn', action='store_true', help='enabling apex sync BN.')
    parser.add_argument('--keep-batchnorm-fp32', type=str, default=None)
    parser.add_argument('--loss-scale', type=str, default=None)

Use the same torchvision models used in pytorch microbenchmarking

classification (vgg, densenet, alexnet, resnet, etc.),
detection (mobilenet),
segmentation (mobilenet, fcn resnet, deeplab resenet),
visision transformers (vit, swin, newer efficientnet models).

Torchaudio

For torchaudio, first we understand the different models available, their inputs, outputs and loss. These inputs, outputs, losses need to be defined for each model type.

Taking models from https://docs.pytorch.org/audio/stable/models.html, classifying into different audio tasks that use different inputs and outputs

task	models	input	loss function
ASR automatic speech recognition	conformer	acoustic features, lengths	CTC loss
	deepspeech	acoustic features	CTC loss
	emformer	acoustic features, lengths	CTC loss
	wav2letter	acoustic features	CTC loss
Source separation	conv tasnet base	waveform with 1 channel	Si-SDR loss
	hdemucs ...	waveform with 2 channels	L1 loss
Acoustic models	wav2vec2 ...	waveform	CTC loss
	hubert ...	waveform	CTC loss
	wavlm ...	waveform	CTC loss
Speech quality	squim objective base	1 waveform	L1 output[0] + 2 * output[2] + 0.5 * output[3] + 2 * L1 signal estimation
	squim subjective base	2 waveforms	L1 loss
Text to speech	wavernn	waveform, spectrogram	cross entropy
	tacotron2	tokens, tokens length, mel spectrogram, spectrogram length	MSE loss
Speech representation	hubert_pretrained ...	waveform, labels	Hubert loss

Test Plan

docker - registry-sc-harbor.amd.com/framework/compute-rocm-dkms-no-npi-hipclang:16972_ubuntu24.04_py3.12_pytorch_release-2.9_7e1940d4

Running the different models for apex and torchaudio microbenchmarks.

python3 micro_benchmarking_apex.py --network resnet50

python3 micro_benchmarking_audio.py --network wav2vec2_base

Test Result

Apex microbenchmark output

--------------------SUMMARY--------------------------
Microbenchmark for network : resnet50
Num devices: 1
Dtype: FP32
Mini batch size [img] : 64
Time per mini-batch : 0.042455744743347165
Throughput [img/sec] : 1507.4520630103616

Torchaudio microbenchmark output

--------------------SUMMARY--------------------------
Microbenchmark for network : wav2vec2_base
Num devices: 1
Dtype: FP32
Mini batch size [ waveform ] : 64
Time per mini-batch : 0.020818877220153808
Throughput [ waveform /sec] : 3074.133120783503

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

…r the wavernn models

…models.py file

…l code is inside the audio folder, added backward pass code for some of the models

…ut code for the audio benchmarking code

…king

…for undefined network names

amd-sriram added 3 commits August 2, 2025 07:42

replace pytorch calls to apex calls in microbenchmarking code

fc751d9

update the optimizer from torch to apex fused sgd optimizer

80b38cd

update the DistributedDataParallel from torch to apex

674bd6b

amd-sriram self-assigned this Oct 13, 2025

amd-sriram added 7 commits October 14, 2025 09:59

Fixed the errors in conformer and wavernn models. also added units fo…

782bf64

…r the wavernn models

refactor the code, move the models, input, output selection to audio_…

e28c8e5

…models.py file

refactored the audio benchmarking code so that most of the lower leve…

fbdd71a

…l code is inside the audio folder, added backward pass code for some of the models

moved audio_model to audio folder

ead8362

add loss functions for more audio models, refactor the input and outp…

0cd37b0

…ut code for the audio benchmarking code

add hubert loss for hubert pretrained model

1405036

add recent changes in pytorch microbenchmarking to apex microbenchmar…

b60ceaa

…king

amd-sriram requested review from jithunnair-amd and pruthvistony February 7, 2026 15:05

amd-sriram added 10 commits February 8, 2026 21:34

add methods to calculate the target

cc26b2f

add loss function for squim objective

3724039

correct usage of squim subjective model

714a142

refactor audio loss code to combine conditions

ad68179

refactor audio loss code to combine conditions

d5b520c

add error messages for undefined input, target, model, loss function …

870711b

…for undefined network names

fix error related to sdr loss

007e0cf

fix error related to hdmucs loss

768bf9c

apply some recent pytorch changes to torchaudio benchmark

8696de9

replace apex amp with torch amp

2f02eb4

amd-sriram marked this pull request as ready for review February 9, 2026 14:33

amd-sriram added 2 commits February 11, 2026 09:09

change the location of optimizer step before profiling

a8ee74d

add readme sections for apex and audio

55c5517

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apex and torchaudio micro benchmarking#21

Apex and torchaudio micro benchmarking#21
amd-sriram wants to merge 22 commits intomasterfrom
apex_micro_benchmarking

amd-sriram commented Oct 13, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

amd-sriram commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Apex microbenchmarking

Torchaudio

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

amd-sriram commented Oct 13, 2025 •

edited

Loading