Skip to content

Commit d50156d

Browse files
authored
Merge pull request #355 from nSircombe/feature/r25.08_updates
r25.08 updates
2 parents c54aebe + 4cb676a commit d50156d

8 files changed

Lines changed: 275 additions & 26 deletions

File tree

ML-Frameworks/pytorch-aarch64/CHANGELOG.md

Lines changed: 32 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,47 @@ where `YY` is the year, and `MM` the month of the increment.
88
## [unreleased]
99

1010
### Added
11-
- Adds https://github.com/pytorch/pytorch/pull/159859, a WIP LUT implmentation of bf15 GELU
12-
~8x speedup over existing oneDNN implementation
11+
12+
### Changed
13+
14+
### Removed
15+
16+
### Fixed
17+
18+
## [r25.08] 2025-08-26
19+
https://github.com/ARM-software/Tool-Solutions/tree/r25.08
20+
21+
### Added
22+
- Adds https://github.com/pytorch/pytorch/pull/159859, a WIP LUT implmentation of bf16 GELU
23+
this gives an ~8x speedup on GELU and an ~1.8x speedup for attention for llama3.2 11B Vision (both on 16 threads).
24+
- Adds https://github.com/pytorch/pytorch/pull/158250, to integrate INT4->BF16 via KleidiAI, with fallback.
25+
- Adds https://github.com/pytorch/pytorch/pull/160080, a VLA PoC for PyTorch.
26+
This includes an optimised SVE implementation of exp().
27+
Note: there may be some regressions on Neoverse-V1 with this WIP patch.
1328

1429
### Changed
1530
- Updates hashes for:
16-
- PYTORCH_HASH to 6662a76f5975bae56ce9171b0afad32b53f89c25, 2.9.0.dev20250731 from viable/strict, August 1st
17-
- IDEEP_HASH to 3527b0bf2127aa2de93810feb6906d173c24037f, from ideep_pytorch, August 1st
18-
- ONEDNN_HASH to 7e85b94b5f6be27b83c5435603ab67888b99da32, from main, August 1st
19-
- ACL_HASH to 3c32d706d0245dcb55181c8ced526eab05e2ff8d, from main, August 1st
20-
- TORCH_AO_HASH to ebfe1736c4442970835b6eda833c0bc5a1ce2dda, from main
31+
- PYTORCH_HASH to 4e2ddb5db67617f9f5309c8bba0c17adc84cadbc, 2.9.0.dev20250808 from viable/strict, August 8th.
32+
- IDEEP_HASH to 3527b0bf2127aa2de93810feb6906d173c24037f, from ideep_pytorch, August 1st.
33+
- ONEDNN_HASH to 7e85b94b5f6be27b83c5435603ab67888b99da32, from main, August 1st.
34+
- ACL_HASH to 3c32d706d0245dcb55181c8ced526eab05e2ff8d, from main, August 1st.
35+
- TORCH_AO_HASH to 8d4a5d83d7be4d7807feabe38d37704c92d40900, from main, August 1st.
36+
- KLEIDIAI_HASH to 8ca226712975f24f13f71d04cda039a0ee9f9e2f, v1.12 from main.
2137
- Update the examples/transformers_llm_text_gen.py to use the new quantizer api Int8DynamicActivationIntxWeightConfig.
22-
- Deleted torchchat_llm_text_gen.py
23-
- Removed Dockerfile lines cloning TorchChat repo and setting safe.directory
38+
- Deleted torchchat_llm_text_gen.py.
39+
- Removed Dockerfile lines cloning TorchChat repo and setting safe.directory.
40+
- Updates huggingface_hub to 0.34.0.
2441

2542
### Removed
26-
- Temporarily removes https://github.com/pytorch/pytorch/pull/150833 (pins all root requirements to major versions)
27-
pending a rebase for current PyTorch hash.
2843
- https://github.com/pytorch/pytorch/pull/151547, to update OpenBLAS commit as this has been merged upstream.
2944

3045
### Fixed
46+
- Updates various Python packages to address known vulnerabilies with a high CVSS score:
47+
- Updates Transformers to 4.55.2, this also provides a mitigation for CVE-2025-2099.
48+
- Updates Wheel version to 0.38.0 as a mitigation for CVE-2022-40898.
49+
- Updates setup-tools to 78.1.1 as a mitigation for CVE-2025-47273 and CVE-2024-6345.
50+
- Updates Torchvison to 0.23 to avoid the need to use `--extra-index-url`
51+
this is the recomended mitgation against CVE-2018-20225, affecting all versions of pip.
3152

3253
## [r25.07] 2025-07-11
3354
https://github.com/ARM-software/Tool-Solutions/tree/r25.07

ML-Frameworks/pytorch-aarch64/Dockerfile

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -90,9 +90,8 @@ COPY $TORCH_AO_WHEEL /home/$DOCKER_USER/
9090
RUN pip install \
9191
torchaudio~=2.6.0 \
9292
torchdata~=0.11.0 \
93-
torchvision~=0.22.0.dev20250403 \
93+
torchvision~=0.23.0 \
9494
torchtune~=0.5.0 \
95-
--extra-index-url https://download.pytorch.org/whl/nightly/cpu \
9695
--no-deps
9796

9897
# We need --no-deps because the torch version won't match the versions on torch*

ML-Frameworks/pytorch-aarch64/examples/README.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,9 +193,37 @@ To access the protected models run
193193
huggingface-cli login --token @hf_token
194194
```
195195

196+
### Vision
197+
198+
The script [llama_vision_instruct.py](llama_vision_instruct.py) uses Llama-3.2-11B-Vision-Instruct to decribe a [sample image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg).
199+
200+
```
201+
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 OMP_NUM_THREADS=16 python llama_vision_instruct.py --benchmark --dtype bfloat16 --quantize
202+
```
203+
204+
#### Command line options
205+
206+
`--num-new-tokens`
207+
The model will always generate this number of new tokens.
208+
209+
`--prompt`
210+
Input prompt.
211+
212+
`--image-url`
213+
URL to image.
214+
215+
`--benchmark`
216+
Run a benchmark, with warmup and multiple iterations.
217+
218+
`--dtype {bfloat16,float32}`
219+
Precision to run the model in (or the non-linear layers for quantized model).
220+
221+
`--quantize`
222+
Quantize weights to int4 symmetric channelwise.
223+
224+
196225
### Text Generation
197226

198-
### Transformers
199227
The script [transformers_llm_text_gen.py](transformers_llm_text_gen.py) demonstrates how to generate text using Llama2 7B model via Transformers. It leverages the 4 bit dynamic quantization speedups and can supports vast number of text models.
200228

201229
Run inference using default (groupwise, layout-aware INT4) using tranformer call:
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
# *******************************************************************************
2+
# Copyright 2025 Arm Limited and affiliates.
3+
# SPDX-License-Identifier: Apache-2.0
4+
#
5+
# Licensed under the Apache License, Version 2.0 (the "License");
6+
# you may not use this file except in compliance with the License.
7+
# You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing, software
12+
# distributed under the License is distributed on an "AS IS" BASIS,
13+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
# See the License for the specific language governing permissions and
15+
# limitations under the License.
16+
# *******************************************************************************
17+
18+
import argparse
19+
import requests
20+
import torch
21+
from PIL import Image
22+
from transformers import MllamaForConditionalGeneration, AutoProcessor, GenerationConfig, TextStreamer
23+
import time
24+
from torchao.quantization.quant_api import (
25+
Int8DynamicActivationIntxWeightConfig,
26+
quantize_,
27+
)
28+
from torchao.dtypes.uintx.packed_linear_int8_dynamic_activation_intx_weight_layout import (
29+
PackedLinearInt8DynamicActivationIntxWeightLayout,
30+
Target,
31+
)
32+
from torchao.quantization.granularity import PerGroup, PerAxis
33+
from torchao.quantization.quant_primitives import MappingType
34+
import numpy as np
35+
import os
36+
37+
def main(args):
38+
39+
model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
40+
model = MllamaForConditionalGeneration.from_pretrained(
41+
model_id,
42+
torch_dtype=torch.bfloat16 if args.dtype == "bfloat16" else torch.float32,
43+
)
44+
45+
if args.quantize:
46+
layout = PackedLinearInt8DynamicActivationIntxWeightLayout(target=Target.ATEN)
47+
quantize_(
48+
model,
49+
Int8DynamicActivationIntxWeightConfig(
50+
weight_scale_dtype=torch.float32,
51+
weight_granularity=PerAxis(0), #PerAxis is also supported
52+
weight_mapping_type=MappingType.SYMMETRIC_NO_CLIPPING_ERR, # MappingType.SYMMETRIC can also be used but increases error
53+
layout=layout,
54+
weight_dtype=torch.int4,
55+
),
56+
)
57+
58+
processor = AutoProcessor.from_pretrained(model_id)
59+
image = Image.open(requests.get(args.image_url, stream=True).raw)
60+
61+
messages = [
62+
{"role": "user", "content": [
63+
{"type": "image"},
64+
{"type": "text", "text": args.prompt + os.linesep}
65+
]}
66+
]
67+
68+
input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
69+
inputs = processor(
70+
image,
71+
input_text,
72+
add_special_tokens=False,
73+
return_tensors="pt"
74+
).to(model.device)
75+
76+
77+
prefill_generation_config = GenerationConfig(do_sample=False, max_new_tokens=1, min_new_tokens=1, temperature=None, top_p=None)
78+
e2e_generation_config = GenerationConfig(do_sample=False, max_new_tokens=args.num_new_tokens, min_new_tokens=args.num_new_tokens, temperature=None, top_p=None)
79+
80+
print("=" * 100)
81+
if args.benchmark:
82+
WARMUP_ITERS = 1
83+
BENCHMARK_ITERS = 3
84+
85+
# prefill
86+
for _ in range(WARMUP_ITERS):
87+
model.generate(**inputs, generation_config=prefill_generation_config)
88+
89+
prefill_times = []
90+
for _ in range(BENCHMARK_ITERS):
91+
start_time = time.time()
92+
model.generate(**inputs, generation_config=prefill_generation_config)
93+
prefill_times.append(time.time() - start_time)
94+
95+
mean_prefill_times = np.mean(prefill_times)
96+
print("Prefill Time: ", mean_prefill_times)
97+
98+
# end to end generation
99+
for _ in range(WARMUP_ITERS):
100+
model.generate(**inputs, generation_config=e2e_generation_config)
101+
102+
e2e_times = []
103+
for _ in range(BENCHMARK_ITERS):
104+
start_time = time.time()
105+
model.generate(**inputs, generation_config=e2e_generation_config)
106+
e2e_times.append(time.time() - start_time)
107+
108+
mean_e2e_times = np.mean(e2e_times)
109+
print("End to End Time: ", mean_e2e_times)
110+
print("Decode Throughput: ", args.num_new_tokens / (mean_e2e_times - mean_prefill_times))
111+
112+
print("Model output:")
113+
streamer = TextStreamer(processor, skip_special_tokens=True)
114+
model.generate(**inputs, streamer=streamer, generation_config=e2e_generation_config)
115+
print("=" * 100)
116+
117+
118+
if __name__ == "__main__":
119+
parser = argparse.ArgumentParser(description="Quantize and Run Benchmark LLM")
120+
parser.add_argument(
121+
"--num-new-tokens",
122+
type=int,
123+
default=32,
124+
help="The model will always generate this number of new tokens",
125+
)
126+
parser.add_argument(
127+
"--prompt",
128+
type=str,
129+
default="Describe this image",
130+
help="Input prompt.",
131+
)
132+
parser.add_argument(
133+
"--image-url",
134+
type=str,
135+
default="https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg",
136+
help="URL to image"
137+
)
138+
parser.add_argument(
139+
"--benchmark",
140+
action="store_true",
141+
help="Run a benchmark, with warmup and multiple iterations"
142+
)
143+
parser.add_argument(
144+
"--dtype",
145+
type=str,
146+
default="bfloat16",
147+
choices=["bfloat16", "float32"],
148+
help="Precision to run the model in (or the non-linear layers for quantized model)"
149+
)
150+
parser.add_argument(
151+
"--quantize",
152+
action="store_true",
153+
help="Quantize weights to int4 symmetric channelwise"
154+
)
155+
156+
args = parser.parse_args()
157+
main(args)

ML-Frameworks/pytorch-aarch64/get-source.sh

Lines changed: 40 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,16 @@
2020
source ../utils/git-utils.sh
2121

2222
set -eux -o pipefail
23-
PYTORCH_HASH=6662a76f5975bae56ce9171b0afad32b53f89c25 # 2.9.0.dev20250731 from viable/strict, August 1st
23+
PYTORCH_HASH=4e2ddb5db67617f9f5309c8bba0c17adc84cadbc # 2.9.0.dev20250808 from viable/strict, August 8th
2424
IDEEP_HASH=3527b0bf2127aa2de93810feb6906d173c24037f # From ideep_pytorch, August 1st
2525
ONEDNN_HASH=7e85b94b5f6be27b83c5435603ab67888b99da32 # From main, August 1st
2626
ACL_HASH=3c32d706d0245dcb55181c8ced526eab05e2ff8d # From main, August 1st
27-
TORCH_AO_HASH=ebfe1736c4442970835b6eda833c0bc5a1ce2dda # From main
27+
TORCH_AO_HASH=8d4a5d83d7be4d7807feabe38d37704c92d40900 # From main, August 1st
28+
KLEIDIAI_HASH=8ca226712975f24f13f71d04cda039a0ee9f9e2f # v1.12 from main
2829

2930
git-shallow-clone https://github.com/pytorch/pytorch.git $PYTORCH_HASH
3031
(
32+
# Apply patches to PyTorch build
3133
cd pytorch
3234

3335
# https://github.com/pytorch/pytorch/pull/152361 - Build libgomp (gcc-11) from source
@@ -37,11 +39,40 @@ git-shallow-clone https://github.com/pytorch/pytorch.git $PYTORCH_HASH
3739
apply-github-patch pytorch/pytorch c4c280eb27859221159108356b7c91376202cdd8
3840

3941
# https://github.com/pytorch/pytorch/pull/160184 - Draft: separate reqs for manywheel build and pin
40-
apply-github-patch pytorch/pytorch 9a8b0df99eac62e7ec6199dd0223a80d26e2dee0
42+
# Note: as part of this patch, setuptools is pinned to ~= 78.1.1 which is not affected by
43+
# CVE-2025-47273 and CVE-2024-6345
44+
apply-github-patch pytorch/pytorch 6d61f487b6ca98b3d80f9e7ecc0a49a1ab528535
45+
46+
47+
# https://github.com/pytorch/pytorch/pull/158250 - Ingtegrate INT4→BF16 via KleidiAI, with fallback
48+
apply-github-patch pytorch/pytorch 7c55f2af0adf9ce62c2226e739a3c84902fe0048
49+
apply-github-patch pytorch/pytorch 8c27947566c85d44bc7dcd7189db5da608453bbb
50+
apply-github-patch pytorch/pytorch 15d78c833b032d3c76b70b12a5f2762fa87d2640
51+
apply-github-patch pytorch/pytorch 186cbcf641f99a301cb26013e8d74d444ad1dcb9
52+
apply-github-patch pytorch/pytorch a6128ce3a0d2080d80e6fa59061d6c085865376c
53+
apply-github-patch pytorch/pytorch 52ee4ddc9a5a9cec8793b1ffeb0d74113e3da417
54+
apply-github-patch pytorch/pytorch ab2a6760e4a4891accbacb9187cf3782cb4b55c3
55+
apply-github-patch pytorch/pytorch 93384233d166dccab5724f9d2e50b6eb3f47cbe6
56+
apply-github-patch pytorch/pytorch 9f6d435629dd251620a1e17b8baa6bc18997f8ab
57+
apply-github-patch pytorch/pytorch b68b7867a72fe2ef4c38f9a3cdd93693700a182e
58+
59+
# https://github.com/pytorch/pytorch/pull/161049 - optimised SVE exp_u20 implementation
60+
# based on Arm Optimised Routines - https://github.com/ARM-software/optimized-routines
61+
apply-github-patch pytorch/pytorch 3de5651bafcdabbc52d5205c0de3976188eba7fb
62+
63+
# https://github.com/pytorch/pytorch/pull/160080 - VLA Vectorized POC
64+
apply-github-patch pytorch/pytorch d5c1aedd5cb85b760abe76099efe64aa535bf1ea
65+
apply-github-patch pytorch/pytorch b1496344c65638f25547b841bb2c470127b7e420
66+
apply-github-patch pytorch/pytorch fd5f544e87e8c3d6890815ae28f1dc807331643a
67+
apply-github-patch pytorch/pytorch 01d97374f5492ca2e1f1eb487e74667a78a00b71
68+
apply-github-patch pytorch/pytorch ea3fca1a47f3673eaf778505142cde765b3ab725
69+
apply-github-patch pytorch/pytorch f5f5e4f802824344ce90c1f37df124990dea934c
70+
apply-github-patch pytorch/pytorch a57478fa655ceff0a910fc936df89b7647ce0e39
4171

4272
# https://github.com/pytorch/pytorch/pull/159859 - PoC LUT optimisation for GELU bf16 operators
43-
apply-github-patch pytorch/pytorch 51626269d3730df1a6b465fa0191074fc31f7c29
73+
apply-github-patch pytorch/pytorch ebcc874e317f9563ab770fc5c27df969e0438a5e
4474

75+
# Update submodules
4576
git submodule sync
4677
git submodule update --init --checkout --force --recursive --jobs=$(nproc)
4778
(
@@ -56,6 +87,11 @@ git-shallow-clone https://github.com/pytorch/pytorch.git $PYTORCH_HASH
5687
apply-github-patch uxlfoundation/oneDNN 466ee88db85db46c8e9cc0535e526efca6308329
5788
)
5889
)
90+
(
91+
cd third_party/kleidiai
92+
git fetch origin $KLEIDIAI_HASH && git clean -f && git checkout -f FETCH_HEAD
93+
)
94+
5995
)
6096

6197
git-shallow-clone https://review.mlplatform.org/ml/ComputeLibrary $ACL_HASH

ML-Frameworks/pytorch-aarch64/requirements.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ datasets~=3.4.1
77
expecttest==0.3.0 # From unit tests
88
filelock~=3.16.1
99
fsspec==2024.9.0
10-
huggingface_hub==0.27.0
10+
huggingface_hub==0.34.0
1111
idna~=3.10
1212
Jinja2~=3.1.4
1313
MarkupSafe~=3.0.1
@@ -19,6 +19,7 @@ opencv-python-headless~=4.10.0.84
1919
packaging~=24.1
2020
pandas~=2.2.3
2121
pillow~=11.0.0
22+
protobuf==5.29.5 # GenAI models dependency
2223
psutil~=7.0.0
2324
pyaml~=24.9.0
2425
python-dateutil~=2.9.0.post0
@@ -32,10 +33,9 @@ sympy~=1.13.1
3233
tiktoken~=0.9.0
3334
tokenizers~=0.21.0
3435
tqdm~=4.66.5
35-
transformers~=4.48.2
36+
transformers~=4.55.2 # >= 4.50.0 due to CVE-2025-2099
3637
typing_extensions~=4.12.2
3738
tzdata==2024.2
3839
urllib3~=2.2.3
39-
sentencepiece==0.2.0 # Torchchat
40-
tomli==2.2.1 # Torchchat
41-
protobuf==5.29.5 # GenAI models dependency
40+
wheel~=0.38.0 # >= 0.39.0 due to CVE-2022-40898
41+

ML-Frameworks/tensorflow-aarch64/CHANGELOG.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,12 +10,20 @@ where `YY` is the year, and `MM` the month of the increment.
1010
### Added
1111

1212
### Changed
13-
- Updates TensorFlow hash to ab8aab720f1648f6a470b159b0d1aea3a5b0df81 # 2.20.0-dev0 from master, 25th July 2025
1413

1514
### Removed
1615

1716
### Fixed
1817

18+
## [r25.08] 2025-08-26
19+
https://github.com/ARM-software/Tool-Solutions/tree/r25.08
20+
21+
### Changed
22+
- Updates TensorFlow hash to ab8aab720f1648f6a470b159b0d1aea3a5b0df81 # 2.20.0-dev0 from master, 25th July 2025
23+
24+
### Fixed
25+
- Updates Transformers to 4.50 as a mitigation for CVE-2025-2099.
26+
1927
## [r25.07] 2025-07-11
2028
https://github.com/ARM-software/Tool-Solutions/tree/r25.07
2129

0 commit comments

Comments
 (0)