Merge pull request #355 from nSircombe/feature/r25.08_updates

nSircombe · web-flow · commit d50156dc1d62 · 2025-08-21T17:32:37.000+01:00
r25.08 updates
diff --git a/ML-Frameworks/pytorch-aarch64/CHANGELOG.md b/ML-Frameworks/pytorch-aarch64/CHANGELOG.md
@@ -8,26 +8,47 @@ where `YY` is the year, and `MM` the month of the increment.
 ## [unreleased]
 
 ### Added
-- Adds https://github.com/pytorch/pytorch/pull/159859, a WIP LUT implmentation of bf15 GELU
-  ~8x speedup over existing oneDNN implementation
+
+### Changed
+
+### Removed
+
+### Fixed
+
+## [r25.08] 2025-08-26
+https://github.com/ARM-software/Tool-Solutions/tree/r25.08
+
+### Added
+- Adds https://github.com/pytorch/pytorch/pull/159859, a WIP LUT implmentation of bf16 GELU
+  this gives an ~8x speedup on GELU and an ~1.8x speedup for attention for llama3.2 11B Vision (both on 16 threads).
+- Adds https://github.com/pytorch/pytorch/pull/158250, to integrate INT4->BF16 via KleidiAI, with fallback.
+- Adds https://github.com/pytorch/pytorch/pull/160080, a VLA PoC for PyTorch.
+  This includes an optimised SVE implementation of exp().
+  Note: there may be some regressions on Neoverse-V1 with this WIP patch.
 
 ### Changed
 - Updates hashes for:
-  - PYTORCH_HASH to 6662a76f5975bae56ce9171b0afad32b53f89c25, 2.9.0.dev20250731 from viable/strict, August 1st
-  - IDEEP_HASH to 3527b0bf2127aa2de93810feb6906d173c24037f, from ideep_pytorch, August 1st
-  - ONEDNN_HASH to 7e85b94b5f6be27b83c5435603ab67888b99da32, from main, August 1st
-  - ACL_HASH to 3c32d706d0245dcb55181c8ced526eab05e2ff8d, from main, August 1st
-  - TORCH_AO_HASH to ebfe1736c4442970835b6eda833c0bc5a1ce2dda, from main
+  - PYTORCH_HASH to 4e2ddb5db67617f9f5309c8bba0c17adc84cadbc, 2.9.0.dev20250808 from viable/strict, August 8th.
+  - IDEEP_HASH to 3527b0bf2127aa2de93810feb6906d173c24037f, from ideep_pytorch, August 1st.
+  - ONEDNN_HASH to 7e85b94b5f6be27b83c5435603ab67888b99da32, from main, August 1st.
+  - ACL_HASH to 3c32d706d0245dcb55181c8ced526eab05e2ff8d, from main, August 1st.
+  - TORCH_AO_HASH to 8d4a5d83d7be4d7807feabe38d37704c92d40900, from main, August 1st.
+  - KLEIDIAI_HASH to 8ca226712975f24f13f71d04cda039a0ee9f9e2f, v1.12 from main.
 - Update the examples/transformers_llm_text_gen.py to use the new quantizer api Int8DynamicActivationIntxWeightConfig.
-- Deleted torchchat_llm_text_gen.py
-- Removed Dockerfile lines cloning TorchChat repo and setting safe.directory
+- Deleted torchchat_llm_text_gen.py.
+- Removed Dockerfile lines cloning TorchChat repo and setting safe.directory.
+- Updates huggingface_hub to 0.34.0.
 
 ### Removed
-- Temporarily removes https://github.com/pytorch/pytorch/pull/150833 (pins all root requirements to major versions)
-  pending a rebase for current PyTorch hash.
 - https://github.com/pytorch/pytorch/pull/151547, to update OpenBLAS commit as this has been merged upstream.
 
 ### Fixed
+- Updates various Python packages to address known vulnerabilies with a high CVSS score:
+  - Updates Transformers to 4.55.2, this also provides a mitigation for CVE-2025-2099.
+  - Updates Wheel version to 0.38.0 as a mitigation for CVE-2022-40898.
+  - Updates setup-tools to 78.1.1 as a mitigation for CVE-2025-47273 and CVE-2024-6345.
+  - Updates Torchvison to 0.23 to avoid the need to use `--extra-index-url`
+    this is the recomended mitgation against CVE-2018-20225, affecting all versions of pip.
 
 ## [r25.07] 2025-07-11
 https://github.com/ARM-software/Tool-Solutions/tree/r25.07
diff --git a/ML-Frameworks/pytorch-aarch64/Dockerfile b/ML-Frameworks/pytorch-aarch64/Dockerfile
@@ -90,9 +90,8 @@ COPY $TORCH_AO_WHEEL /home/$DOCKER_USER/
 RUN pip install \
     torchaudio~=2.6.0 \
     torchdata~=0.11.0 \
-    torchvision~=0.22.0.dev20250403 \
+    torchvision~=0.23.0 \
     torchtune~=0.5.0 \
-    --extra-index-url https://download.pytorch.org/whl/nightly/cpu \
     --no-deps
 
 # We need --no-deps because the torch version won't match the versions on torch*
diff --git a/ML-Frameworks/pytorch-aarch64/examples/README.md b/ML-Frameworks/pytorch-aarch64/examples/README.md
@@ -193,9 +193,37 @@ To access the protected models run
 huggingface-cli login --token @hf_token
 ```
 
+### Vision
+
+The script [llama_vision_instruct.py](llama_vision_instruct.py) uses Llama-3.2-11B-Vision-Instruct to decribe a [sample image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg).
+
+```
+LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4    OMP_NUM_THREADS=16 python llama_vision_instruct.py --benchmark --dtype bfloat16 --quantize
+```
+
+#### Command line options
+
+`--num-new-tokens`
+  The model will always generate this number of new tokens.
+
+`--prompt`
+  Input prompt.
+
+`--image-url`
+  URL to image.
+
+`--benchmark`
+  Run a benchmark, with warmup and multiple iterations.
+
+`--dtype {bfloat16,float32}`
+  Precision to run the model in (or the non-linear layers for quantized model).
+
+`--quantize`
+  Quantize weights to int4 symmetric channelwise.
+
+
 ### Text Generation
 
-### Transformers
 The script [transformers_llm_text_gen.py](transformers_llm_text_gen.py) demonstrates how to generate text using Llama2 7B model via Transformers. It leverages the 4 bit dynamic quantization speedups and can supports vast number of text  models.
 
 Run inference using default (groupwise, layout-aware INT4) using tranformer call:
diff --git a/ML-Frameworks/pytorch-aarch64/examples/llama_vision_instruct.py b/ML-Frameworks/pytorch-aarch64/examples/llama_vision_instruct.py
@@ -0,0 +1,157 @@
+# *******************************************************************************
+# Copyright 2025 Arm Limited and affiliates.
+# SPDX-License-Identifier: Apache-2.0
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+# *******************************************************************************
+
+import argparse
+import requests
+import torch
+from PIL import Image
+from transformers import MllamaForConditionalGeneration, AutoProcessor, GenerationConfig, TextStreamer
+import time
+from torchao.quantization.quant_api import (
+    Int8DynamicActivationIntxWeightConfig,
+    quantize_,
+)
+from torchao.dtypes.uintx.packed_linear_int8_dynamic_activation_intx_weight_layout import (
+    PackedLinearInt8DynamicActivationIntxWeightLayout,
+    Target,
+)
+from torchao.quantization.granularity import PerGroup, PerAxis
+from torchao.quantization.quant_primitives import MappingType
+import numpy as np
+import os
+
+def main(args):
+
+    model_id = "meta-llama/Llama-3.2-11B-Vision-Instruct"
+    model = MllamaForConditionalGeneration.from_pretrained(
+        model_id,
+        torch_dtype=torch.bfloat16 if args.dtype == "bfloat16" else torch.float32,
+    )
+
+    if args.quantize:
+        layout = PackedLinearInt8DynamicActivationIntxWeightLayout(target=Target.ATEN)
+        quantize_(
+            model,
+            Int8DynamicActivationIntxWeightConfig(
+                weight_scale_dtype=torch.float32,
+                weight_granularity=PerAxis(0),  #PerAxis is also supported
+                weight_mapping_type=MappingType.SYMMETRIC_NO_CLIPPING_ERR, # MappingType.SYMMETRIC can also be used but increases error
+                layout=layout,
+                weight_dtype=torch.int4,
+            ),
+        )
+
+    processor = AutoProcessor.from_pretrained(model_id)
+    image = Image.open(requests.get(args.image_url, stream=True).raw)
+
+    messages = [
+        {"role": "user", "content": [
+            {"type": "image"},
+            {"type": "text", "text": args.prompt + os.linesep}
+        ]}
+    ]
+
+    input_text = processor.apply_chat_template(messages, add_generation_prompt=True)
+    inputs = processor(
+        image,
+        input_text,
+        add_special_tokens=False,
+        return_tensors="pt"
+    ).to(model.device)
+
+
+    prefill_generation_config = GenerationConfig(do_sample=False, max_new_tokens=1, min_new_tokens=1, temperature=None, top_p=None)
+    e2e_generation_config = GenerationConfig(do_sample=False, max_new_tokens=args.num_new_tokens, min_new_tokens=args.num_new_tokens, temperature=None, top_p=None)
+
+    print("=" * 100)
+    if args.benchmark:
+        WARMUP_ITERS = 1
+        BENCHMARK_ITERS = 3
+
+        # prefill
+        for _ in range(WARMUP_ITERS):
+            model.generate(**inputs, generation_config=prefill_generation_config)
+
+        prefill_times = []
+        for _ in range(BENCHMARK_ITERS):
+            start_time = time.time()
+            model.generate(**inputs, generation_config=prefill_generation_config)
+            prefill_times.append(time.time() - start_time)
+
+        mean_prefill_times = np.mean(prefill_times)
+        print("Prefill Time: ", mean_prefill_times)
+
+        # end to end generation
+        for _ in range(WARMUP_ITERS):
+            model.generate(**inputs, generation_config=e2e_generation_config)
+
+        e2e_times = []
+        for _ in range(BENCHMARK_ITERS):
+            start_time = time.time()
+            model.generate(**inputs, generation_config=e2e_generation_config)
+            e2e_times.append(time.time() - start_time)
+
+        mean_e2e_times = np.mean(e2e_times)
+        print("End to End Time: ", mean_e2e_times)
+        print("Decode Throughput: ", args.num_new_tokens / (mean_e2e_times - mean_prefill_times))
+
+    print("Model output:")
+    streamer = TextStreamer(processor, skip_special_tokens=True)
+    model.generate(**inputs,  streamer=streamer, generation_config=e2e_generation_config)
+    print("=" * 100)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Quantize and Run Benchmark LLM")
+    parser.add_argument(
+        "--num-new-tokens",
+        type=int,
+        default=32,
+        help="The model will always generate this number of new tokens",
+    )
+    parser.add_argument(
+        "--prompt",
+        type=str,
+        default="Describe this image",
+        help="Input prompt.",
+    )
+    parser.add_argument(
+        "--image-url",
+        type=str,
+        default="https://huggingface.co/datasets/huggingface/documentation-images/resolve/0052a70beed5bf71b92610a43a52df6d286cd5f3/diffusers/rabbit.jpg",
+        help="URL to image"
+    )
+    parser.add_argument(
+        "--benchmark",
+        action="store_true",
+        help="Run a benchmark, with warmup and multiple iterations"
+    )
+    parser.add_argument(
+        "--dtype",
+        type=str,
+        default="bfloat16",
+        choices=["bfloat16", "float32"],
+        help="Precision to run the model in (or the non-linear layers for quantized model)"
+    )
+    parser.add_argument(
+        "--quantize",
+        action="store_true",
+        help="Quantize weights to int4 symmetric channelwise"
+    )
+
+    args = parser.parse_args()
+    main(args)
diff --git a/ML-Frameworks/pytorch-aarch64/get-source.sh b/ML-Frameworks/pytorch-aarch64/get-source.sh
@@ -20,14 +20,16 @@
 source ../utils/git-utils.sh
 
 set -eux -o pipefail
-PYTORCH_HASH=6662a76f5975bae56ce9171b0afad32b53f89c25  # 2.9.0.dev20250731 from viable/strict, August 1st
+PYTORCH_HASH=4e2ddb5db67617f9f5309c8bba0c17adc84cadbc  # 2.9.0.dev20250808 from viable/strict, August 8th
 IDEEP_HASH=3527b0bf2127aa2de93810feb6906d173c24037f    # From ideep_pytorch, August 1st
 ONEDNN_HASH=7e85b94b5f6be27b83c5435603ab67888b99da32   # From main, August 1st
 ACL_HASH=3c32d706d0245dcb55181c8ced526eab05e2ff8d      # From main, August 1st
-TORCH_AO_HASH=ebfe1736c4442970835b6eda833c0bc5a1ce2dda # From main
+TORCH_AO_HASH=8d4a5d83d7be4d7807feabe38d37704c92d40900 # From main, August 1st
+KLEIDIAI_HASH=8ca226712975f24f13f71d04cda039a0ee9f9e2f # v1.12 from main
 
 git-shallow-clone https://github.com/pytorch/pytorch.git $PYTORCH_HASH
 (
+    # Apply patches to PyTorch build
     cd pytorch
 
     # https://github.com/pytorch/pytorch/pull/152361 - Build libgomp (gcc-11) from source
@@ -37,11 +39,40 @@ git-shallow-clone https://github.com/pytorch/pytorch.git $PYTORCH_HASH
     apply-github-patch pytorch/pytorch c4c280eb27859221159108356b7c91376202cdd8
 
     # https://github.com/pytorch/pytorch/pull/160184 - Draft: separate reqs for manywheel build and pin
-    apply-github-patch pytorch/pytorch 9a8b0df99eac62e7ec6199dd0223a80d26e2dee0
+    # Note: as part of this patch, setuptools is pinned to ~= 78.1.1 which is not affected by
+    # CVE-2025-47273 and CVE-2024-6345
+    apply-github-patch pytorch/pytorch 6d61f487b6ca98b3d80f9e7ecc0a49a1ab528535
+
+
+    # https://github.com/pytorch/pytorch/pull/158250 - Ingtegrate INT4→BF16 via KleidiAI, with fallback
+    apply-github-patch pytorch/pytorch 7c55f2af0adf9ce62c2226e739a3c84902fe0048
+    apply-github-patch pytorch/pytorch 8c27947566c85d44bc7dcd7189db5da608453bbb
+    apply-github-patch pytorch/pytorch 15d78c833b032d3c76b70b12a5f2762fa87d2640
+    apply-github-patch pytorch/pytorch 186cbcf641f99a301cb26013e8d74d444ad1dcb9
+    apply-github-patch pytorch/pytorch a6128ce3a0d2080d80e6fa59061d6c085865376c
+    apply-github-patch pytorch/pytorch 52ee4ddc9a5a9cec8793b1ffeb0d74113e3da417
+    apply-github-patch pytorch/pytorch ab2a6760e4a4891accbacb9187cf3782cb4b55c3
+    apply-github-patch pytorch/pytorch 93384233d166dccab5724f9d2e50b6eb3f47cbe6
+    apply-github-patch pytorch/pytorch 9f6d435629dd251620a1e17b8baa6bc18997f8ab
+    apply-github-patch pytorch/pytorch b68b7867a72fe2ef4c38f9a3cdd93693700a182e
+
+    # https://github.com/pytorch/pytorch/pull/161049 - optimised SVE exp_u20 implementation
+    # based on Arm Optimised Routines - https://github.com/ARM-software/optimized-routines
+    apply-github-patch pytorch/pytorch 3de5651bafcdabbc52d5205c0de3976188eba7fb
+
+    # https://github.com/pytorch/pytorch/pull/160080 - VLA Vectorized POC
+    apply-github-patch pytorch/pytorch d5c1aedd5cb85b760abe76099efe64aa535bf1ea
+    apply-github-patch pytorch/pytorch b1496344c65638f25547b841bb2c470127b7e420
+    apply-github-patch pytorch/pytorch fd5f544e87e8c3d6890815ae28f1dc807331643a
+    apply-github-patch pytorch/pytorch 01d97374f5492ca2e1f1eb487e74667a78a00b71
+    apply-github-patch pytorch/pytorch ea3fca1a47f3673eaf778505142cde765b3ab725
+    apply-github-patch pytorch/pytorch f5f5e4f802824344ce90c1f37df124990dea934c
+    apply-github-patch pytorch/pytorch a57478fa655ceff0a910fc936df89b7647ce0e39
 
     # https://github.com/pytorch/pytorch/pull/159859 - PoC LUT optimisation for GELU bf16 operators
-    apply-github-patch pytorch/pytorch 51626269d3730df1a6b465fa0191074fc31f7c29
+    apply-github-patch pytorch/pytorch ebcc874e317f9563ab770fc5c27df969e0438a5e
 
+    # Update submodules
     git submodule sync
     git submodule update --init --checkout --force --recursive --jobs=$(nproc)
     (
@@ -56,6 +87,11 @@ git-shallow-clone https://github.com/pytorch/pytorch.git $PYTORCH_HASH
             apply-github-patch uxlfoundation/oneDNN 466ee88db85db46c8e9cc0535e526efca6308329
         )
     )
+    (
+        cd third_party/kleidiai
+        git fetch origin $KLEIDIAI_HASH && git clean -f && git checkout -f FETCH_HEAD
+    )
+
 )
 
 git-shallow-clone https://review.mlplatform.org/ml/ComputeLibrary $ACL_HASH
diff --git a/ML-Frameworks/pytorch-aarch64/requirements.txt b/ML-Frameworks/pytorch-aarch64/requirements.txt
@@ -7,7 +7,7 @@ datasets~=3.4.1
 expecttest==0.3.0 # From unit tests
 filelock~=3.16.1
 fsspec==2024.9.0
-huggingface_hub==0.27.0
+huggingface_hub==0.34.0
 idna~=3.10
 Jinja2~=3.1.4
 MarkupSafe~=3.0.1
@@ -19,6 +19,7 @@ opencv-python-headless~=4.10.0.84
 packaging~=24.1
 pandas~=2.2.3
 pillow~=11.0.0
+protobuf==5.29.5      # GenAI models dependency
 psutil~=7.0.0
 pyaml~=24.9.0
 python-dateutil~=2.9.0.post0
@@ -32,10 +33,9 @@ sympy~=1.13.1
 tiktoken~=0.9.0
 tokenizers~=0.21.0
 tqdm~=4.66.5
-transformers~=4.48.2
+transformers~=4.55.2  # >= 4.50.0 due to CVE-2025-2099
 typing_extensions~=4.12.2
 tzdata==2024.2
 urllib3~=2.2.3
-sentencepiece==0.2.0  # Torchchat
-tomli==2.2.1          # Torchchat
-protobuf==5.29.5      # GenAI models dependency
+wheel~=0.38.0         # >= 0.39.0 due to CVE-2022-40898
+
diff --git a/ML-Frameworks/tensorflow-aarch64/CHANGELOG.md b/ML-Frameworks/tensorflow-aarch64/CHANGELOG.md
@@ -10,12 +10,20 @@ where `YY` is the year, and `MM` the month of the increment.
 ### Added
 
 ### Changed
- - Updates TensorFlow hash to ab8aab720f1648f6a470b159b0d1aea3a5b0df81 # 2.20.0-dev0 from master, 25th July 2025
 
 ### Removed
 
 ### Fixed
 
+## [r25.08] 2025-08-26
+https://github.com/ARM-software/Tool-Solutions/tree/r25.08
+
+### Changed
+ - Updates TensorFlow hash to ab8aab720f1648f6a470b159b0d1aea3a5b0df81 # 2.20.0-dev0 from master, 25th July 2025
+
+### Fixed
+ - Updates Transformers to 4.50 as a mitigation for CVE-2025-2099.
+
 ## [r25.07] 2025-07-11
 https://github.com/ARM-software/Tool-Solutions/tree/r25.07
 
diff --git a/ML-Frameworks/tensorflow-aarch64/requirements.txt b/ML-Frameworks/tensorflow-aarch64/requirements.txt