Skip to content

Commit 101d456

Browse files
authored
Merge pull request #358 from nSircombe/feature/r25.08_updates
Minor corrections to CHANGELOG and examples README
2 parents 2da9e14 + a67a429 commit 101d456

2 files changed

Lines changed: 6 additions & 4 deletions

File tree

ML-Frameworks/pytorch-aarch64/CHANGELOG.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,16 +15,18 @@ where `YY` is the year, and `MM` the month of the increment.
1515

1616
### Fixed
1717

18-
## [r25.08] 2025-08-26
18+
## [r25.08] 2025-08-27
1919
https://github.com/ARM-software/Tool-Solutions/tree/r25.08
2020

2121
### Added
2222
- Adds https://github.com/pytorch/pytorch/pull/159859, a WIP LUT implmentation of bf16 GELU
2323
this gives an ~8x speedup on GELU and an ~1.8x speedup for attention for llama3.2 11B Vision (both on 16 threads).
2424
- Adds https://github.com/pytorch/pytorch/pull/158250, to integrate INT4->BF16 via KleidiAI, with fallback.
25-
- Adds https://github.com/pytorch/pytorch/pull/160080, a VLA PoC for PyTorch.
26-
This includes an optimised SVE implementation of exp().
25+
- Adds https://github.com/pytorch/pytorch/pull/160080, a VLA PoC for PyTorch, and
26+
https://github.com/pytorch/pytorch/pull/161049, an optimised SVE exp_u20 implementation,
2727
Note: there may be some regressions on Neoverse-V1 with this WIP patch.
28+
- Adds a new example script llama_vision_instruct.py to run and benchmark
29+
Llama-3.2-11B-Vision-Instruct using text + image input and text output.
2830

2931
### Changed
3032
- Updates hashes for:

ML-Frameworks/pytorch-aarch64/examples/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -195,7 +195,7 @@ huggingface-cli login --token @hf_token
195195

196196
### Vision
197197

198-
The script [llama_vision_instruct.py](llama_vision_instruct.py) uses Llama-3.2-11B-Vision-Instruct to decribe a [sample image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg).
198+
The script [llama_vision_instruct.py](llama_vision_instruct.py) runs and benchmarks Llama-3.2-11B-Vision-Instruct using text + image input and text output.
199199

200200
```
201201
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 OMP_NUM_THREADS=16 python llama_vision_instruct.py --benchmark --dtype bfloat16 --quantize

0 commit comments

Comments
 (0)