Quantization-Conditioned Alignment Degradation
Krishna Venkatesh
Post-training quantization enables language model deployment on edge hardware across the Global South, yet its effect on safety alignment remains unstudied. We benchmark Attack Success Rate (ASR) on Llama 3.1 8B across four GGUF quantization levels (Q8, Q5, Q4, Q3) against a full BF16 precision control group served via Groq, using 150 prompts from the AdvBench Harmful Behaviors dataset and an LLM-as-judge scoring protocol. We find that ASR remains flat across all GGUF quantization levels (0.7% for
Q3 through Q8), while full-precision BF16 models served via API exhibit substantially higher ASR (7.3% for Llama 3.1 8B, 11.3% for Llama 3.3 70B), indicating that serving infrastructure, not quantization precision but drives the primary variation in refusal behaviour. These results directly inform deployment standards for resource-constrained settings where Q4 and Q3 quantization are practical necessities. Our fully reproducible pipeline is open-source and extensible to other model families.
No reviews are available yet
Cite this work
@misc {
title={
(HckPrj) Quantization-Conditioned Alignment Degradation
},
author={
Krishna Venkatesh
},
date={
},
organization={Apart Research},
note={Research submission to the research sprint hosted by Apart.},
howpublished={https://apartresearch.com}
}


