Skip to content

Add benchmark for WavDecoder#1474

Merged
NicolasHug merged 2 commits into
meta-pytorch:mainfrom
NicolasHug:wav_benchmark
Jun 1, 2026
Merged

Add benchmark for WavDecoder#1474
NicolasHug merged 2 commits into
meta-pytorch:mainfrom
NicolasHug:wav_benchmark

Conversation

@NicolasHug
Copy link
Copy Markdown
Contributor

Adds benchmark of WavDecoder against AudioDecoder against soundfile (float) against soundfile (native == dtype that is closest to wav source, usually without conversion, but not always, e.g. u8 wav still has to be converted to int16). Both WavDecoder and AudioDecoder always convert to float32.

Results on my machine are the following. Unsurprisingly the WavDecoder is much faster than AudioDecoder. We're overall faster than soundfile(float) but I would take the results against soundfile with a grain of salt. I observed vastly different perf depending on the libsoundfile.so that gets resolved, and most long file benhmarks show a modest improvement anyway. Unsurprisingly soundfile tends to be faster in the 'native' scenario since it does less: it doesn't convert to float32, like we do. The one bit that I still cannot explain despite some investigation is the wav float32 decoding time:

float32    5min              18.95          37.02              5.71                 5.71            1.95x               0.30x               0.30x

How can we take 18ms and soundfile take 5? That makes no sense to me, we literally just copy the entire data into the output tensor in one shot. The cost of a memcpy for that size is in the range of 18ms, so there's no logical explanation for libsoundfile perf other than some sort of caching. In cache? via memmap?? via some smart numpy mechanism??? I have no idea, I couldn't figure it out.


===========================================================================================================================================================
SUMMARY
===========================================================================================================================================================
format     duration    WavDec (ms)  AudioDec (ms)  sndfile f32 (ms)  sndfile native (ms)  AudioDec/WavDec  sndfile f32/WavDec  sndfile nat/WavDec
-----------------------------------------------------------------------------------------------------------------------------------------------------------
u8         10s                0.09           0.86              0.60                 0.49            9.11x               6.36x               5.23x
u8         5min              16.67          35.90             21.06                13.53            2.15x               1.26x               0.81x
s16        10s                0.10           8.12              0.77                 0.09           81.97x               7.79x               0.89x
s16        5min              16.51          40.65             26.26                 1.33            2.46x               1.59x               0.08x
s24        10s                0.39           1.01              1.57                 1.36            2.57x               4.01x               3.46x
s24        5min              26.54          41.75             49.99                43.44            1.57x               1.88x               1.64x
s32        10s                0.11           0.80              0.76                 0.13            7.11x               6.74x               1.15x
s32        5min              17.91          35.64             26.21                 5.75            1.99x               1.46x               0.32x
float32    10s                0.10           0.83              0.13                 0.12            8.14x               1.23x               1.22x
float32    5min              18.95          37.02              5.71                 5.71            1.95x               0.30x               0.30x
float64    10s                0.21           1.07              0.80                 0.21            5.11x               3.85x               1.00x
float64    5min              21.86          45.27             29.77                12.03            2.07x               1.36x               0.55x

Also benchmarked different input types just for WavDecoder:


====================================================================================================
FILE vs FILE-LIKE vs BYTES
====================================================================================================
format     duration      file (ms)   file-like (ms)   bytes (ms)   flike/file   bytes/file
----------------------------------------------------------------------------------------------------
u8         10s                0.13             0.18         0.09        1.40x        0.70x
u8         5min              18.27            19.69        16.59        1.08x        0.91x
s16        10s                0.15             0.25         0.09        1.63x        0.60x
s16        5min              19.50            22.68        16.74        1.16x        0.86x
s24        10s                0.49             0.62         0.39        1.27x        0.80x
s24        5min              30.48            34.64        26.30        1.14x        0.86x
s32        10s                0.22             0.41         0.11        1.84x        0.50x
s32        5min              23.16            29.40        17.74        1.27x        0.77x
float32    10s                0.15             0.21         0.10        1.46x        0.66x
float32    5min              18.42            37.25        18.93        2.02x        1.03x
float64    10s                0.41             0.82         0.20        2.01x        0.48x
float64    5min              32.69            45.01        22.57        1.38x        0.69x

There is unsurprisingly some overhead with the file-like object, and reading from bytes is faster as it bypasses the io part. This is consistent with expectations.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Jun 1, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1474

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jun 1, 2026
@NicolasHug NicolasHug merged commit 19f1202 into meta-pytorch:main Jun 1, 2026
26 checks passed
@NicolasHug NicolasHug deleted the wav_benchmark branch June 1, 2026 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant