ARM-software
diff --git a/‎ML-Frameworks/pytorch-aarch64/CHANGELOG.md‎
Lines changed: 5 additions & 0 deletions b/‎ML-Frameworks/pytorch-aarch64/CHANGELOG.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎ML-Frameworks/pytorch-aarch64/Dockerfile‎
Lines changed: 0 additions & 8 deletions b/‎ML-Frameworks/pytorch-aarch64/Dockerfile‎
Lines changed: 0 additions & 8 deletions
diff --git a/‎ML-Frameworks/pytorch-aarch64/examples/README.md‎
Lines changed: 16 additions & 26 deletions b/‎ML-Frameworks/pytorch-aarch64/examples/README.md‎
Lines changed: 16 additions & 26 deletions
diff --git a/‎ML-Frameworks/pytorch-aarch64/examples/torchchat_llm_text_gen.py‎
Lines changed: 0 additions & 62 deletions b/‎ML-Frameworks/pytorch-aarch64/examples/torchchat_llm_text_gen.py‎
Lines changed: 0 additions & 62 deletions
@@ -10,6 +10,11 @@ where `YY` is the year, and `MM` the month of the increment.
 ### Added
 
 ### Changed
+- Updates hash for:
+  - TORCH_AO_HASH to ebfe1736c4442970835b6eda833c0bc5a1ce2dda, from main
+- Update the examples/transformers_llm_text_gen.py to use the new quantizer api Int8DynamicActivationIntxWeightConfig.
+- Deleted torchchat_llm_text_gen.py
+- Removed Dockerfile lines cloning TorchChat repo and setting safe.directory
 
 ### Removed
 
 
@@ -106,14 +106,6 @@ RUN pip install "$(basename "$TORCH_AO_WHEEL")" --no-deps \
 # Setup Examples
 COPY examples/ /home/$DOCKER_USER/
 
-# Llm examples depends on torchchat
-RUN sudo mkdir -p /home/ubuntu/gen_ai_utils/ && \
-    cd /home/ubuntu/gen_ai_utils/ && \
-    sudo git clone https://github.com/pytorch/torchchat.git -b main && \
-    cd torchchat && \
-    sudo git config --global --add safe.directory /home/ubuntu/gen_ai_utils/torchchat && \
-    sudo git checkout 90749d280bbc116fcc121a1eda1b60f1dba5b675
-
 # Move build into final image as a single layer.
 FROM ${DOCKER_IMAGE_MIRROR}ubuntu:22.04
 
 
@@ -195,45 +195,35 @@ huggingface-cli login --token @hf_token
 
 ### Text Generation
 
-### Torchchat
-The script [torchchat_llm_text_gen.py](torchchat_llm_text_gen.py) demonstrates how to run llm inference using the Llama2 7B model via torchchat. It leverages the 4 bit dynamic quantization speedups and can supports multiple vision and text  models.
+### Transformers
+The script [transformers_llm_text_gen.py](transformers_llm_text_gen.py) demonstrates how to generate text using Llama2 7B model via Transformers. It leverages the 4 bit dynamic quantization speedups and can supports vast number of text  models.
 
-To run infernece using torchchat call:
+Run inference using default (groupwise, layout-aware INT4) using tranformer call:
 
 ```
-LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4  TORCHINDUCTOR_CPP_WRAPPER=1  TORCHINDUCTOR_FREEZING=1  OMP_NUM_THREADS=16 python torchchat_llm_text_gen.py --compile
+LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4  TORCHINDUCTOR_CPP_WRAPPER=1  TORCHINDUCTOR_FREEZING=1  OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --compile
 ```
 
-#### Command-Line Options
-
-`--quant-config`
-  Description: Path to the model quantization config.
-
-`--max-new-tokens`
-  Description: Max new tokens to generate.
-
-`--compile`
-  Description: Whether to compile the model (default: `False`).
+Run with symmetric_channelwise quantization:
 
-`--model`
-  Description: Model alias. (Default: `"llama2"`  )
-
-`--prompt`
-  Description: Input prompt for model generation.
-
-### Transformers
-The script [transformers_llm_text_gen.py](transformers_llm_text_gen.py) demonstrates how to generate text using Llama2 7B model via Transformers. It leverages the 4 bit dynamic quantization speedups and can supports vast number of text  models.
+```
+LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4  TORCHINDUCTOR_CPP_WRAPPER=1  TORCHINDUCTOR_FREEZING=1  OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --quant-scheme symmetric_channelwise --compile
+```
 
-To run infernece using torchchat call:
+Run with custom group size (e.g. 64):
 
 ```
-LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4  TORCHINDUCTOR_CPP_WRAPPER=1  TORCHINDUCTOR_FREEZING=1  OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --compile
+LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4  TORCHINDUCTOR_CPP_WRAPPER=1  TORCHINDUCTOR_FREEZING=1  OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --quant-scheme symmetric_groupwise --groupsize 64 --compile
 ```
 
+
 #### Command-Line Options
 
-`--quant-config`
-  Description: Path to the model quantization config.
+`--quant-scheme`
+  Description: Quantization scheme to apply: symmetric_channelwise or symmetric_groupwise.
+
+`--groupsize`
+  Description: groupsize (used only with symmetric_groupwise).
 
 `--max-new-tokens`
   Description: Max new tokens to generate.