Skip to content

TinyLlama-v0: Fail to export model with spin_rotation_weight_bits 8 option #792

Description

@hseok-oh

I failed to export Maykeye/TinyLlama-v0 circle model

Command

$ python tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py \
  --model /home/nfs/model/TinyLLama-v0 
  --output_dir /home/nfs/model/TinyLLama-v0/tico-custom3 \
  --save circle_per_layer --cache_dir /tmp/tico \
  --max_seq_len 256 --embedding_weight_bits 4 --lm_head_weight_bits 4 \
  --decode_calibration_steps 1 --spin_rotation_weight_bits 8 

=== Config ===
Model                  : /home/nfs/model/TinyLLama-v0
Device                 : cuda
DType                  : float32
Seed                   : 42
GPTQ enabled           : True
GPTQ lm_head enabled   : False
PTQ enabled            : True
SpinQuant enabled      : True
CLE enabled            : False
Linear weight bits     : 4
Embedding weight bits  : 4
LM head weight bits    : 4
Spin rotation bits     : 8
Calibration samples    : 128
Calibration seq length : 2048
Max seq length         : 256
Profile                : npu_export

Error message

Saving token_embedding to /home/nfs/model/TinyLLama-v0/tico-custom3/token_embedding.q.circle
Traceback (most recent call last):
  File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 1430, in <module>
    main()
  File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 1426, in main
    save_requested_artifacts(q_m, tokenizer, calib_inputs, args)
  File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 1394, in save_requested_artifacts
    save_layers_to(
  File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 808, in save_layers_to
    save_token_embedding_to(
  File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 726, in save_token_embedding_to
    save_export_module_to(
  File "/home/nfs/git/TICO/tico/quantization/wrapq/examples/quantize_full_qmodel_with_gptq.py", line 687, in save_export_module_to
    cm = tico.convert(
  File "/home/nfs/git/TICO/tico/utils/convert.py", line 367, in convert
    circle_binary = convert_exported_module_to_circle(
  File "/home/nfs/git/TICO/tico/utils/convert.py", line 228, in convert_exported_module_to_circle
    decompose_quantize_op.run(exported_program)
  File "/home/nfs/git/TICO/tico/utils/passes.py", line 65, in run
    result = _pass.call(exported_program)
  File "/home/nfs/git/TICO/tico/utils/trace_decorators.py", line 37, in wrapped
    ret = fn(*args)
  File "/home/nfs/git/TICO/tico/utils/trace_decorators.py", line 63, in wrapped
    ret = fn(*args)
  File "/home/nfs/git/TICO/tico/passes/decompose_fake_quantize_tensor_qparams.py", line 274, in call
    **{"dtype": get_quant_type(quant_min, quant_max)},
  File "/home/nfs/git/TICO/tico/passes/decompose_fake_quantize_tensor_qparams.py", line 52, in get_quant_type
    raise RuntimeError(f"Not supported min/max values: {min}/{max}")
RuntimeError: Not supported min/max values: -128/127

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions