Add benchmark for WavDecoder#1474
Merged
Merged
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/meta-pytorch/torchcodec/1474
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds benchmark of WavDecoder against AudioDecoder against soundfile (float) against soundfile (native == dtype that is closest to wav source, usually without conversion, but not always, e.g. u8 wav still has to be converted to int16). Both WavDecoder and AudioDecoder always convert to float32.
Results on my machine are the following. Unsurprisingly the WavDecoder is much faster than AudioDecoder. We're overall faster than soundfile(float) but I would take the results against soundfile with a grain of salt. I observed vastly different perf depending on the libsoundfile.so that gets resolved, and most long file benhmarks show a modest improvement anyway. Unsurprisingly soundfile tends to be faster in the 'native' scenario since it does less: it doesn't convert to float32, like we do. The one bit that I still cannot explain despite some investigation is the wav float32 decoding time:
How can we take 18ms and soundfile take 5? That makes no sense to me, we literally just copy the entire data into the output tensor in one shot. The cost of a
memcpyfor that size is in the range of 18ms, so there's no logical explanation for libsoundfile perf other than some sort of caching. In cache? via memmap?? via some smart numpy mechanism??? I have no idea, I couldn't figure it out.Also benchmarked different input types just for WavDecoder:
There is unsurprisingly some overhead with the file-like object, and reading from bytes is faster as it bypasses the
iopart. This is consistent with expectations.