@@ -35,41 +35,13 @@ python scripts/phi3_ptq_qat_benchmark.py \
3535 --max-new-tokens 4 \
3636 --run-qat \
3737 --qat-steps 5 \
38- --train-split ' train[:20]' \
39- --json-output benchmarks/gpt2_ptq_qat.json
38+ --train-split ' train[:20]'
4039```
4140
4241Expected output:
4342- Console summary with size, compression ratio, perplexity, and tok/s for baseline/PTQ/QAT.
44- - ` benchmarks/gpt2_ptq_qat.json ` with stage metrics and run configuration.
4543
46- ### 3) Phi-3 PTQ/QAT baseline (PPL-focused)
47-
48- ``` bash
49- python scripts/phi3_ptq_qat_benchmark.py \
50- --model-id microsoft/Phi-3-mini-4k-instruct \
51- --device cpu \
52- --dtype float32 \
53- --max-eval-tokens 1024 \
54- --eval-texts 32 \
55- --max-new-tokens 64 \
56- --skip-latency \
57- --json-output benchmarks/phi3_ptq_qat.json
58- ```
59-
60- Expected output:
61- - Console summary with baseline/PTQ perplexity and size.
62- - ` benchmarks/phi3_ptq_qat.json ` with stage metrics and run configuration.
63-
64- Observed output (size-only PTQ PPL skipped):
65- - baseline size: 14.23 GiB
66- - PTQ size: 1.16 GiB
67- - compression ratio: 12.3x
68- - baseline PPL: 6.76
69- - PTQ PPL: pending (CPU run skipped)
70- - QAT: pending
71-
72- ### 4) ViT CIFAR-10 PTQ/QAT baseline (quick)
44+ ### 3) ViT CIFAR-10 PTQ/QAT baseline (quick)
7345
7446``` bash
7547python scripts/vit_ptq_qat_benchmark.py \
@@ -89,12 +61,7 @@ Expected output:
8961- Console summary with size, accuracy/loss, and images/s for baseline/PTQ/QAT.
9062- ` benchmarks/vit_cifar10_baseline.json ` with stage metrics and model metadata.
9163
92- Observed output (size-only smoke run, eval/throughput skipped):
93- - model: facebook/deit-tiny-patch16-224
94- - baseline size: 0.0206 GiB
95- - PTQ size: 0.00173 GiB
96-
97- ### 5) GGUF export + load check
64+ ### 4) GGUF export + load check
9865
9966``` bash
10067t81 convert microsoft/Phi-3-mini-4k-instruct phi3-t81 --threshold 0.45 --force-cpu-device-map
@@ -121,7 +88,7 @@ Observed output:
12188- prompt: 54.35 ms/token (18.4 tok/s)
12289- eval: 56.22 ms/token (17.79 tok/s)
12390
124- ### 6 ) GEMM throughput (CPU)
91+ ### 5 ) GEMM throughput (CPU)
12592
12693``` bash
12794python - << 'PY '
141108Expected output:
142109- Console prints ` gemm_ternary OK (1024, 1024) ` .
143110
144- ## JSON artifact schema
145-
146- ` scripts/phi3_ptq_qat_benchmark.py ` and ` scripts/vit_ptq_qat_benchmark.py ` can emit JSON summaries with:
147-
148- - ` model_id ` , ` dataset ` , ` device ` , ` dtype ` , ` threshold `
149- - ` baseline ` , ` ptq ` , ` qat ` objects with stage metrics (` size_gib ` plus either ` ppl ` /` tok_s ` for Phi-3 or ` accuracy ` /` loss ` /` images_per_s ` for ViT)
150- - run configuration (` max_eval_tokens ` , ` eval_texts ` , ` max_new_tokens ` , ` qat_steps ` , ` train_split ` , etc.)
151-
152111## Script overview
153112
1541131 . The benchmark builds a ` TinyClassifier ` (a single ` nn.Linear ` head on flattened Fashion-MNIST images) and trains it in FP32 for a few epochs.
0 commit comments