Apple: Embarrassingly Simple Self-Distillation Improves Code Generation
↗A simple self-distillation method yields substantial code-generation gains across LLMs, boosting pass@1 on LiveCodeBench v6 from 42.4% to 55.3% and generalizing across Qwen and Llama models at 4B/8B/30B scales.
Apr 4, 20261%