AdaDS: Adaptive data selection for accelerating pre-trained language model knowledge distillation
Knowledge distillation (KD) is a widely used method for transferring knowledge from large teacher models to computationally efficient student models. Unfortunately, the computational cost of KD becomes unaffordable as pre-trained language models (PLMs) grow larger. Computing KD loss on only part of...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
KeAi Communications Co. Ltd.
2023-01-01
|
Series: | AI Open |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666651023000074 |