Efficient 8-Bit Quantization of Transformer Neural Machine Language TranslationModel
This paper discusses efficient 8-bit quantization for Transformer neural machine language translation models. The goal is to optimize the performance and efficiency of these models by reducing memory consumption and latency.
