The Sequence Knowledge #886: Demystifying Model Distillation
Understanding the key principles of distillation in simple terms.

TL;DR
- Knowledge distillation is a method to train a small model using a large model.
- The 'teacher' is a large, capable, but expensive model.
- The 'student' is a smaller, faster, and cheaper model.
- The student learns from the teacher's interpretations, not just the original data.
- This allows the student to achieve higher capabilities than if trained conventionally.