For ONNX Quantization, operators separated by inserted QDQ nodes are not quantized

I used the exact script on the **Onnx Examples >> Mixed Precision** page (https://quark.docs.amd.com/latest/tutorials/onnx/accuracy_improvement/mixed_precision/onnx_mixed_precision_tutorial.html) to quantize the **densenet121.ra_in1k.onnx** model. 

However, when viewing the quantized Onnx model on netron, it shows pairs of QDQ nodes added in between operators, leaving the operator in FP32. I have attached the quantized model as attachment. A simple example output for Gemm:

**Quantize node >> Dequantize node >> Gemm >> Quantize node >> Dequantize node >> ..**

Instead, I would have expected the graph to be along the lines of  
**Quantize node >> Gemm >> Dequantize node >> ..** . No changes were made to the script from the provided examples. Are there additional configs or installations that should have been added to prevent this?

[densenet121.ra_in1k_quantized.onnx.zip](https://github.com/user-attachments/files/28387539/densenet121.ra_in1k_quantized.onnx.zip)

Python 3.12.13
cuda 12.8
amd-quark               0.11.2
onnx                    1.19.0
onnx-ir                 0.2.1
onnxruntime-gpu         1.26.0
onnxscript              0.7.0
onnxslim                0.1.94
torch                   2.11.0+cu128
torchaudio              2.11.0+cu128

Machine:
x86_64
Tesla T4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For ONNX Quantization, operators separated by inserted QDQ nodes are not quantized #33

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

For ONNX Quantization, operators separated by inserted QDQ nodes are not quantized #33

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions