Skip to content

For ONNX Quantization, operators separated by inserted QDQ nodes are not quantized #33

@hmfyaparm

Description

@hmfyaparm

I used the exact script on the Onnx Examples >> Mixed Precision page (https://quark.docs.amd.com/latest/tutorials/onnx/accuracy_improvement/mixed_precision/onnx_mixed_precision_tutorial.html) to quantize the densenet121.ra_in1k.onnx model.

However, when viewing the quantized Onnx model on netron, it shows pairs of QDQ nodes added in between operators, leaving the operator in FP32. I have attached the quantized model as attachment. A simple example output for Gemm:

Quantize node >> Dequantize node >> Gemm >> Quantize node >> Dequantize node >> ..

Instead, I would have expected the graph to be along the lines of
Quantize node >> Gemm >> Dequantize node >> .. . No changes were made to the script from the provided examples. Are there additional configs or installations that should have been added to prevent this?

densenet121.ra_in1k_quantized.onnx.zip

Python 3.12.13
cuda 12.8
amd-quark 0.11.2
onnx 1.19.0
onnx-ir 0.2.1
onnxruntime-gpu 1.26.0
onnxscript 0.7.0
onnxslim 0.1.94
torch 2.11.0+cu128
torchaudio 2.11.0+cu128

Machine:
x86_64
Tesla T4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions