Making LLMs Faster Without Retraining
A personal experiment log testing multiple compression techniques across two model scales, ending with a GPT-2 Large that runs 1.6× faster and outperforms its uncompressed baseline by 13.7%.
· 25 min read · ai-ml, inference