LAD: Layer-Wise Adaptive Distillation for BERT Model Compression

Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant potential to natural language processing. However, the large model size hinders their use in IoT and edge devices. Several studies have utilized task-specific knowledge distillation to compress the pre...

Full description

Bibliographic Details
Main Authors: Ying-Jia Lin, Kuan-Yu Chen, Hung-Yu Kao
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Sensors
Subjects:
Online Access:https://www.mdpi.com/1424-8220/23/3/1483