1 min readMar 8, 2019
Hi Muhammad,
Quantization means reducing the number of bits to store the model. example: from 4bytes to 1byte, this reduce the model size three fold. Likewise there are number of such reduction you can perform, remove the unwanted operations for inference, etc. This is nothing to do with how (CNN) convolve operation happens on input image and hence the input size remains as is. Have a look at this article to understand how this happen: https://medium.com/@tomdeore/deep-learning-in-gradient-descent-style-part-2-e159e2cf8a99
Hope this helps?