The Sequence Knowledge #886: Demystifying Model Distillation

Understanding the key principles of distillation in simple terms.

The Sequence Knowledge #886: Demystifying Model Distillation

TL;DR

  • Knowledge distillation is a method to train a small model using a large model.
  • The 'teacher' is a large, capable, but expensive model.
  • The 'student' is a smaller, faster, and cheaper model.
  • The student learns from the teacher's interpretations, not just the original data.
  • This allows the student to achieve higher capabilities than if trained conventionally.