I am trying to save a quantized ternary model to a .tflite file, but larq doesn't seem to save the weights using datatypes with a reduced precision and thus compress the file size.
However, after converting and writing to disk, the size of the file is about the same as the one predicted by larq.models.summary in float32 parameters.
Even if I try to do the same thing with a simple QuantDense layer, the weights are saved in float32.
I am using this kind of code:
quantDense = larq.layers.QuantDense(1000, kernel_quantizer="ste_sign", use_bias=False)
quantDense(tf.ones((1, 500)))
with larq.context.quantized_scope(True):
inp_quant = keras.Input((1,500))
out_quant = quantDense(inp_quant)
quantModelTest = keras.Model(inputs=inp_quant, outputs=out_quant)
print("Keras test model")
larq.models.summary(quantModelTest)
print("converting keras test model to tflite")
converted = lce.convert_keras_model(quantModelTest)
with open("test.tflite", "wb") as f:
print("writing tflite model to disk")
f.write(converted)
Am I doing something wrong?
I am trying to save a quantized ternary model to a
.tflitefile, but larq doesn't seem to save the weights using datatypes with a reduced precision and thus compress the file size.However, after converting and writing to disk, the size of the file is about the same as the one predicted by
larq.models.summaryin float32 parameters.Even if I try to do the same thing with a simple
QuantDenselayer, the weights are saved in float32.I am using this kind of code:
Am I doing something wrong?