I used the exact script on the Onnx Examples >> Mixed Precision page (https://quark.docs.amd.com/latest/tutorials/onnx/accuracy_improvement/mixed_precision/onnx_mixed_precision_tutorial.html) to quantize the densenet121.ra_in1k.onnx model.
However, when viewing the quantized Onnx model on netron, it shows pairs of QDQ nodes added in between operators, leaving the operator in FP32. I have attached the quantized model as attachment. A simple example output for Gemm:
Quantize node >> Dequantize node >> Gemm >> Quantize node >> Dequantize node >> ..
Instead, I would have expected the graph to be along the lines of
Quantize node >> Gemm >> Dequantize node >> .. . No changes were made to the script from the provided examples. Are there additional configs or installations that should have been added to prevent this?
densenet121.ra_in1k_quantized.onnx.zip
Python 3.12.13
cuda 12.8
amd-quark 0.11.2
onnx 1.19.0
onnx-ir 0.2.1
onnxruntime-gpu 1.26.0
onnxscript 0.7.0
onnxslim 0.1.94
torch 2.11.0+cu128
torchaudio 2.11.0+cu128
Machine:
x86_64
Tesla T4
I used the exact script on the Onnx Examples >> Mixed Precision page (https://quark.docs.amd.com/latest/tutorials/onnx/accuracy_improvement/mixed_precision/onnx_mixed_precision_tutorial.html) to quantize the densenet121.ra_in1k.onnx model.
However, when viewing the quantized Onnx model on netron, it shows pairs of QDQ nodes added in between operators, leaving the operator in FP32. I have attached the quantized model as attachment. A simple example output for Gemm:
Quantize node >> Dequantize node >> Gemm >> Quantize node >> Dequantize node >> ..
Instead, I would have expected the graph to be along the lines of
Quantize node >> Gemm >> Dequantize node >> .. . No changes were made to the script from the provided examples. Are there additional configs or installations that should have been added to prevent this?
densenet121.ra_in1k_quantized.onnx.zip
Python 3.12.13
cuda 12.8
amd-quark 0.11.2
onnx 1.19.0
onnx-ir 0.2.1
onnxruntime-gpu 1.26.0
onnxscript 0.7.0
onnxslim 0.1.94
torch 2.11.0+cu128
torchaudio 2.11.0+cu128
Machine:
x86_64
Tesla T4