Adversarial Distillation for Learning with Privileged Provisions.

Clicks: 245
ID: 61043
2019
Knowledge distillation aims to train a student (model) for accurate inference in a resource-constrained environment. Traditionally, the student is trained by a high-capacity teacher (model) whose training is resource-intensive. The student trained this way is suboptimal because it is difficult to learn the real data distribution from the teacher. To address this issue, we propose to train the student against a discriminator in a minimax game. Such a minimax game has an issue that it can take an excessively long time for the training to converge. To address this issue, we propose adversarial distillation consisting of a student, a teacher, and a discriminator. The discriminator is now a multi-class classifier that distinguishes among the real data, the student, and the teacher. The student and the teacher aim to fool the discriminator via adversarial losses, while they learn from each other via distillation losses. By optimizing the adversarial and the distillation losses simultaneously, the student and the teacher can learn the real data distribution. To accelerate the training, we propose to obtain low-variance gradient updates from the discriminator using a Gumbel-Softmax trick. We conduct extensive experiments to demonstrate the superiority of the proposed adversarial distillation under both accuracy and training speed.
Reference Key
wang2019adversarialieee Use this key to autocite in the manuscript while using SciMatic Manuscript Manager or Thesis Manager
Authors Wang, Xiaojie;Zhang, Rui;Sun, Yu;Qi, Jianzhong;
Journal ieee transactions on pattern analysis and machine intelligence
Year 2019
DOI 10.1109/TPAMI.2019.2942592
URL
Keywords

Citations

No citations found. To add a citation, contact the admin at info@scimatic.org

No comments yet. Be the first to comment on this article.