Distilling from professors: Enhancing the knowledge distillation of teachers

Duhyeon Bang, Jongwuk Lee, Hyunjung Shim

Research output: Contribution to journalArticlepeer-review

4 Citations (Scopus)

Abstract

Knowledge distillation (KD) is a successful technique for transferring knowledge from one machine learning model to another model. Specifically, the idea of KD has been widely used for various tasks such as model compression and knowledge transfer between different models. However, existing studies in KD have overlooked the possibility that dark knowledge (i.e., soft targets) obtained from a complex and large model (a.k.a., a teacher model) may be either incorrect or insufficient. Such knowledge can hinder the effective learning of another small model (a.k.a., a student model). In this paper, we propose the professor model, which refines the soft target from the teacher model to improve KD. The professor model aims to achieve two goals; 1) improving the prediction accuracy and 2) capturing the inter-class correlation of the soft target from the teacher model. We first design the professor model by reformulating a conditional adversarial autoencoder (CAAE). Then, we devise two KD strategies using both teacher and professor models. Our empirical study demonstrates that the professor model effectively improves KD in three benchmark datasets: CIFAR100, TinyImagenet, and ILSVRC2015. Moreover, our comprehensive analysis shows that the professor model is much more effective than employing the stronger teacher model, in which parameters are greater than the sum of the teacher's and professor's parameters. Since the proposed model is model-agnostic, our model can be combined with any KD algorithm and consistently improves various KD techniques.

Original languageEnglish
Pages (from-to)743-755
Number of pages13
JournalInformation sciences
Volume576
DOIs
Publication statusPublished - 2021 Oct

Bibliographical note

Funding Information:
This research was supported by the Basic Science Research Program through the NRF Korea funded by the MSIP (NRF-2019R1A2C2006123, 2020R1A4A1016619), the IITP grant funded by the MSIT (2020-0-01361, Artificial Intelligence Graduate School Program (YONSEI UNIVERSITY)), and the Korea Medical Device Development Fund grant funded by the Korean government (Project Number: 202011D06).

Publisher Copyright:
© 2021 Elsevier Inc.

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Software
  • Control and Systems Engineering
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Distilling from professors: Enhancing the knowledge distillation of teachers'. Together they form a unique fingerprint.

Cite this