Merge pull request #358 from nSircombe/feature/r25.08_updates

nSircombe · web-flow · commit 101d456745d2 · 2025-08-22T22:19:55.000+01:00
Minor corrections to CHANGELOG and examples README
diff --git a/ML-Frameworks/pytorch-aarch64/CHANGELOG.md b/ML-Frameworks/pytorch-aarch64/CHANGELOG.md
@@ -15,16 +15,18 @@ where `YY` is the year, and `MM` the month of the increment.
 
 ### Fixed
 
-## [r25.08] 2025-08-26
+## [r25.08] 2025-08-27
 https://github.com/ARM-software/Tool-Solutions/tree/r25.08
 
 ### Added
 - Adds https://github.com/pytorch/pytorch/pull/159859, a WIP LUT implmentation of bf16 GELU
   this gives an ~8x speedup on GELU and an ~1.8x speedup for attention for llama3.2 11B Vision (both on 16 threads).
 - Adds https://github.com/pytorch/pytorch/pull/158250, to integrate INT4->BF16 via KleidiAI, with fallback.
-- Adds https://github.com/pytorch/pytorch/pull/160080, a VLA PoC for PyTorch.
-  This includes an optimised SVE implementation of exp().
+- Adds https://github.com/pytorch/pytorch/pull/160080, a VLA PoC for PyTorch, and
+  https://github.com/pytorch/pytorch/pull/161049, an optimised SVE exp_u20 implementation,
   Note: there may be some regressions on Neoverse-V1 with this WIP patch.
+- Adds a new example script llama_vision_instruct.py to run and benchmark
+  Llama-3.2-11B-Vision-Instruct using text + image input and text output.
 
 ### Changed
 - Updates hashes for:
diff --git a/ML-Frameworks/pytorch-aarch64/examples/README.md b/ML-Frameworks/pytorch-aarch64/examples/README.md
@@ -195,7 +195,7 @@ huggingface-cli login --token @hf_token
 
 ### Vision
 
-The script [llama_vision_instruct.py](llama_vision_instruct.py) uses Llama-3.2-11B-Vision-Instruct to decribe a [sample image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg).
+The script [llama_vision_instruct.py](llama_vision_instruct.py) runs and benchmarks Llama-3.2-11B-Vision-Instruct using text + image input and text output.
 
 ```
 LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4    OMP_NUM_THREADS=16 python llama_vision_instruct.py --benchmark --dtype bfloat16 --quantize