PET: Parameter-efficient Knowledge Distillation on Transformer.

Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost...

Ful tanımlama

Detaylı Bibliyografya
Asıl Yazarlar: Hyojin Jeon, Seungcheol Park, Jin-Gee Kim, U Kang
Materyal Türü: Makale
Dil:English
Baskı/Yayın Bilgisi: Public Library of Science (PLoS) 2023-01-01
Seri Bilgileri:PLoS ONE
Online Erişim:https://doi.org/10.1371/journal.pone.0288060