$ python tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py \
--model /home/nfs/model/TinyLLama-v0
--output_dir /home/nfs/model/TinyLLama-v0/tico-custom3 \
--save circle_per_layer --cache_dir /tmp/tico \
--max_seq_len 256 --embedding_weight_bits 4 --lm_head_weight_bits 4 \
--decode_calibration_steps 1 --spin_rotation_weight_bits 8
=== Config ===
Model : /home/nfs/model/TinyLLama-v0
Device : cuda
DType : float32
Seed : 42
GPTQ enabled : True
GPTQ lm_head enabled : False
PTQ enabled : True
SpinQuant enabled : True
CLE enabled : False
Linear weight bits : 4
Embedding weight bits : 4
LM head weight bits : 4
Spin rotation bits : 8
Calibration samples : 128
Calibration seq length : 2048
Max seq length : 256
Profile : npu_export
Saving token_embedding to /home/nfs/model/TinyLLama-v0/tico-custom3/token_embedding.q.circle
Traceback (most recent call last):
File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 1430, in <module>
main()
File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 1426, in main
save_requested_artifacts(q_m, tokenizer, calib_inputs, args)
File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 1394, in save_requested_artifacts
save_layers_to(
File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 808, in save_layers_to
save_token_embedding_to(
File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 726, in save_token_embedding_to
save_export_module_to(
File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 687, in save_export_module_to
cm = tico.convert(
File "/home/nfs/git/TICO/tico/utils/convert.py", line 367, in convert
circle_binary = convert_exported_module_to_circle(
File "/home/nfs/git/TICO/tico/utils/convert.py", line 228, in convert_exported_module_to_circle
decompose_quantize_op.run(exported_program)
File "/home/nfs/git/TICO/tico/utils/passes.py", line 65, in run
result = _pass.call(exported_program)
File "/home/nfs/git/TICO/tico/utils/trace_decorators.py", line 37, in wrapped
ret = fn(*args)
File "/home/nfs/git/TICO/tico/utils/trace_decorators.py", line 63, in wrapped
ret = fn(*args)
File "/home/nfs/git/TICO/tico/passes/decompose_fake_quantize_tensor_qparams.py", line 274, in call
**{"dtype": get_quant_type(quant_min, quant_max)},
File "/home/nfs/git/TICO/tico/passes/decompose_fake_quantize_tensor_qparams.py", line 52, in get_quant_type
raise RuntimeError(f"Not supported min/max values: {min}/{max}")
RuntimeError: Not supported min/max values: -128/127
I failed to export Maykeye/TinyLlama-v0 circle model
Command
Error message