quantization

_{scroll ↓ to Resources}

Note

Quantization is the process of decreasing the numerical precision in which weights and activations are stored, transferred and operated upon. The default representation of weights and activations is usually 32 bits floating numbers, with quantization we can drop the precision to 8 or even 4 bit integers, effectively “scaling down” the granularity and mapping a larger original range of values to a smaller set of discreet levels.
In the figure, the original float7 (purple dots) are mapped to the closest float5 value (green dots) after quantization resulting in lower granularity. This approximation process reduces the number of unique values the model uses, compressing the model while retaining a reasonable level of accuracy.
In cases where quantization might introduce a quality regression, that regression can be small compared to the performance gain, therefore allowing for an effective quality vs latency/throughput tradeoff.
can be either applied as an inference-only operation (Post-training quantization (PTQ)), or it can be incorporated into the training - referred to as Quantization-aware training (QAT).
- QAT is generally considered to be a more resilient approach as the model is able to recover some of the quantisation-related quality losses during training
For the best cost-quality trade-offs, one can tweak quantization strategy by selecting different precisions for weights and activations, and also the granularity in which the quantization is applied to tensors, such as channel or group-wise.
- [2310.04836] Dual Grained Quantization: Efficient Fine-Grained Quantization for LLM

Activation-aware Weights Quantization (AWQ)

Post-training quantization (PTQ)

Quantization-aware training (QAT)

Resources

Links to this File

table file.inlinks, file.outlinks from [[]] and !outgoing([[]])  AND -"Changelog"

Notitia Restante 🌱

On this site

quantization

Note

Activation-aware Weights Quantization (AWQ)

Post-training quantization (PTQ)

Quantization-aware training (QAT)

Resources

Links to this File

Graph View

On this page

Backlinks

Recent

ResNet

image retrieval

image segmentation

contrastive loss

test Lowe