Skip to content

Commit f1617cb

Browse files
committed
shrink
1 parent c6c5db9 commit f1617cb

1 file changed

Lines changed: 4 additions & 45 deletions

File tree

BENCHMARKS.md

Lines changed: 4 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -35,41 +35,13 @@ python scripts/phi3_ptq_qat_benchmark.py \
3535
--max-new-tokens 4 \
3636
--run-qat \
3737
--qat-steps 5 \
38-
--train-split 'train[:20]' \
39-
--json-output benchmarks/gpt2_ptq_qat.json
38+
--train-split 'train[:20]'
4039
```
4140

4241
Expected output:
4342
- Console summary with size, compression ratio, perplexity, and tok/s for baseline/PTQ/QAT.
44-
- `benchmarks/gpt2_ptq_qat.json` with stage metrics and run configuration.
4543

46-
### 3) Phi-3 PTQ/QAT baseline (PPL-focused)
47-
48-
```bash
49-
python scripts/phi3_ptq_qat_benchmark.py \
50-
--model-id microsoft/Phi-3-mini-4k-instruct \
51-
--device cpu \
52-
--dtype float32 \
53-
--max-eval-tokens 1024 \
54-
--eval-texts 32 \
55-
--max-new-tokens 64 \
56-
--skip-latency \
57-
--json-output benchmarks/phi3_ptq_qat.json
58-
```
59-
60-
Expected output:
61-
- Console summary with baseline/PTQ perplexity and size.
62-
- `benchmarks/phi3_ptq_qat.json` with stage metrics and run configuration.
63-
64-
Observed output (size-only PTQ PPL skipped):
65-
- baseline size: 14.23 GiB
66-
- PTQ size: 1.16 GiB
67-
- compression ratio: 12.3x
68-
- baseline PPL: 6.76
69-
- PTQ PPL: pending (CPU run skipped)
70-
- QAT: pending
71-
72-
### 4) ViT CIFAR-10 PTQ/QAT baseline (quick)
44+
### 3) ViT CIFAR-10 PTQ/QAT baseline (quick)
7345

7446
```bash
7547
python scripts/vit_ptq_qat_benchmark.py \
@@ -89,12 +61,7 @@ Expected output:
8961
- Console summary with size, accuracy/loss, and images/s for baseline/PTQ/QAT.
9062
- `benchmarks/vit_cifar10_baseline.json` with stage metrics and model metadata.
9163

92-
Observed output (size-only smoke run, eval/throughput skipped):
93-
- model: facebook/deit-tiny-patch16-224
94-
- baseline size: 0.0206 GiB
95-
- PTQ size: 0.00173 GiB
96-
97-
### 5) GGUF export + load check
64+
### 4) GGUF export + load check
9865

9966
```bash
10067
t81 convert microsoft/Phi-3-mini-4k-instruct phi3-t81 --threshold 0.45 --force-cpu-device-map
@@ -121,7 +88,7 @@ Observed output:
12188
- prompt: 54.35 ms/token (18.4 tok/s)
12289
- eval: 56.22 ms/token (17.79 tok/s)
12390

124-
### 6) GEMM throughput (CPU)
91+
### 5) GEMM throughput (CPU)
12592

12693
```bash
12794
python - <<'PY'
@@ -141,14 +108,6 @@ PY
141108
Expected output:
142109
- Console prints `gemm_ternary OK (1024, 1024)`.
143110

144-
## JSON artifact schema
145-
146-
`scripts/phi3_ptq_qat_benchmark.py` and `scripts/vit_ptq_qat_benchmark.py` can emit JSON summaries with:
147-
148-
- `model_id`, `dataset`, `device`, `dtype`, `threshold`
149-
- `baseline`, `ptq`, `qat` objects with stage metrics (`size_gib` plus either `ppl`/`tok_s` for Phi-3 or `accuracy`/`loss`/`images_per_s` for ViT)
150-
- run configuration (`max_eval_tokens`, `eval_texts`, `max_new_tokens`, `qat_steps`, `train_split`, etc.)
151-
152111
## Script overview
153112

154113
1. The benchmark builds a `TinyClassifier` (a single `nn.Linear` head on flattened Fashion-MNIST images) and trains it in FP32 for a few epochs.

0 commit comments

Comments
 (0)