LAD: Layer-Wise Adaptive Distillation for BERT Model Compression

Recent advances with large-scale pre-trained language models (e.g., BERT) have brought significant potential to natural language processing. However, the large model size hinders their use in IoT and edge devices. Several studies have utilized task-specific knowledge distillation to compress the pre...

Full description

Bibliographic Details
Main Authors:	Ying-Jia Lin, Kuan-Yu Chen, Hung-Yu Kao
Format:	Article
Language:	English
Published:	MDPI AG 2023-01-01
Series:	Sensors
Subjects:	model compression knowledge distillation BERT text classification natural language processing deep learning
Online Access:	https://www.mdpi.com/1424-8220/23/3/1483

Internet

https://www.mdpi.com/1424-8220/23/3/1483

LAD: Layer-Wise Adaptive Distillation for BERT Model Compression

Internet

Similar Items