In your benchmark you are using onnxruntime model without using the kv cache or io binding ?
In your benchmark you are using onnxruntime model without using the kv cache or io binding ?