Skip to content

Commit 7522b1a

Browse files
authored
Merge pull request #348 from gausah-arm/new_TS_work
[Update] Update new quantizer API usage, bump TorchAO version, and remove torchchat.
2 parents 44599eb + 0a971cb commit 7522b1a

6 files changed

Lines changed: 150 additions & 206 deletions

File tree

ML-Frameworks/pytorch-aarch64/CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ where `YY` is the year, and `MM` the month of the increment.
1010
### Added
1111

1212
### Changed
13+
- Updates hash for:
14+
- TORCH_AO_HASH to ebfe1736c4442970835b6eda833c0bc5a1ce2dda, from main
15+
- Update the examples/transformers_llm_text_gen.py to use the new quantizer api Int8DynamicActivationIntxWeightConfig.
16+
- Deleted torchchat_llm_text_gen.py
17+
- Removed Dockerfile lines cloning TorchChat repo and setting safe.directory
1318

1419
### Removed
1520

ML-Frameworks/pytorch-aarch64/Dockerfile

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -106,14 +106,6 @@ RUN pip install "$(basename "$TORCH_AO_WHEEL")" --no-deps \
106106
# Setup Examples
107107
COPY examples/ /home/$DOCKER_USER/
108108

109-
# Llm examples depends on torchchat
110-
RUN sudo mkdir -p /home/ubuntu/gen_ai_utils/ && \
111-
cd /home/ubuntu/gen_ai_utils/ && \
112-
sudo git clone https://github.com/pytorch/torchchat.git -b main && \
113-
cd torchchat && \
114-
sudo git config --global --add safe.directory /home/ubuntu/gen_ai_utils/torchchat && \
115-
sudo git checkout 90749d280bbc116fcc121a1eda1b60f1dba5b675
116-
117109
# Move build into final image as a single layer.
118110
FROM ${DOCKER_IMAGE_MIRROR}ubuntu:22.04
119111

ML-Frameworks/pytorch-aarch64/examples/README.md

Lines changed: 16 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -195,45 +195,35 @@ huggingface-cli login --token @hf_token
195195

196196
### Text Generation
197197

198-
### Torchchat
199-
The script [torchchat_llm_text_gen.py](torchchat_llm_text_gen.py) demonstrates how to run llm inference using the Llama2 7B model via torchchat. It leverages the 4 bit dynamic quantization speedups and can supports multiple vision and text models.
198+
### Transformers
199+
The script [transformers_llm_text_gen.py](transformers_llm_text_gen.py) demonstrates how to generate text using Llama2 7B model via Transformers. It leverages the 4 bit dynamic quantization speedups and can supports vast number of text models.
200200

201-
To run infernece using torchchat call:
201+
Run inference using default (groupwise, layout-aware INT4) using tranformer call:
202202

203203
```
204-
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python torchchat_llm_text_gen.py --compile
204+
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --compile
205205
```
206206

207-
#### Command-Line Options
208-
209-
`--quant-config`
210-
Description: Path to the model quantization config.
211-
212-
`--max-new-tokens`
213-
Description: Max new tokens to generate.
214-
215-
`--compile`
216-
Description: Whether to compile the model (default: `False`).
207+
Run with symmetric_channelwise quantization:
217208

218-
`--model`
219-
Description: Model alias. (Default: `"llama2"` )
220-
221-
`--prompt`
222-
Description: Input prompt for model generation.
223-
224-
### Transformers
225-
The script [transformers_llm_text_gen.py](transformers_llm_text_gen.py) demonstrates how to generate text using Llama2 7B model via Transformers. It leverages the 4 bit dynamic quantization speedups and can supports vast number of text models.
209+
```
210+
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --quant-scheme symmetric_channelwise --compile
211+
```
226212

227-
To run infernece using torchchat call:
213+
Run with custom group size (e.g. 64):
228214

229215
```
230-
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --compile
216+
LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libtcmalloc.so.4 TORCHINDUCTOR_CPP_WRAPPER=1 TORCHINDUCTOR_FREEZING=1 OMP_NUM_THREADS=16 python transformers_llm_text_gen.py --quant-scheme symmetric_groupwise --groupsize 64 --compile
231217
```
232218

219+
233220
#### Command-Line Options
234221

235-
`--quant-config`
236-
Description: Path to the model quantization config.
222+
`--quant-scheme`
223+
Description: Quantization scheme to apply: symmetric_channelwise or symmetric_groupwise.
224+
225+
`--groupsize`
226+
Description: groupsize (used only with symmetric_groupwise).
237227

238228
`--max-new-tokens`
239229
Description: Max new tokens to generate.

ML-Frameworks/pytorch-aarch64/examples/torchchat_llm_text_gen.py

Lines changed: 0 additions & 62 deletions
This file was deleted.

0 commit comments

Comments
 (0)