PET: Parameter-efficient Knowledge Distillation on Transformer.

Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost...

Full description

Bibliographic Details
Main Authors:	Hyojin Jeon, Seungcheol Park, Jin-Gee Kim, U Kang
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2023-01-01
Series:	PLoS ONE
Online Access:	https://doi.org/10.1371/journal.pone.0288060

_version_	1827895714804400128
author	Hyojin Jeon Seungcheol Park Jin-Gee Kim U Kang
author_facet	Hyojin Jeon Seungcheol Park Jin-Gee Kim U Kang
author_sort	Hyojin Jeon
collection	DOAJ
description	Given a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (Parameter-Efficient knowledge distillation on Transformer), an efficient Transformer compression method that reduces the size of both the encoder and decoder. In PET, we identify and exploit pairs of parameter groups for efficient weight sharing, and employ a warm-up process using a simplified task to increase the gain through Knowledge Distillation. Extensive experiments on five real-world datasets show that PET outperforms existing methods in machine translation tasks. Specifically, on the IWSLT'14 EN→DE task, PET reduces the memory usage by 81.20% and accelerates the inference speed by 45.15% compared to the uncompressed model, with a minor decrease in BLEU score of 0.27.
first_indexed	2024-03-12T22:25:45Z
format	Article
id	doaj.art-7938351659f64d2fbeef1ab4a55da66d
institution	Directory Open Access Journal
issn	1932-6203
language	English
last_indexed	2024-03-12T22:25:45Z
publishDate	2023-01-01
publisher	Public Library of Science (PLoS)
record_format	Article
series	PLoS ONE
spelling	doaj.art-7938351659f64d2fbeef1ab4a55da66d2023-07-22T05:31:45ZengPublic Library of Science (PLoS)PLoS ONE1932-62032023-01-01187e028806010.1371/journal.pone.0288060PET: Parameter-efficient Knowledge Distillation on Transformer.Hyojin JeonSeungcheol ParkJin-Gee KimU KangGiven a large Transformer model, how can we obtain a small and computationally efficient model which maintains the performance of the original model? Transformer has shown significant performance improvements for many NLP tasks in recent years. However, their large size, expensive computational cost, and long inference time make it challenging to deploy them to resource-constrained devices. Existing Transformer compression methods mainly focus on reducing the size of the encoder ignoring the fact that the decoder takes the major portion of the long inference time. In this paper, we propose PET (Parameter-Efficient knowledge distillation on Transformer), an efficient Transformer compression method that reduces the size of both the encoder and decoder. In PET, we identify and exploit pairs of parameter groups for efficient weight sharing, and employ a warm-up process using a simplified task to increase the gain through Knowledge Distillation. Extensive experiments on five real-world datasets show that PET outperforms existing methods in machine translation tasks. Specifically, on the IWSLT'14 EN→DE task, PET reduces the memory usage by 81.20% and accelerates the inference speed by 45.15% compared to the uncompressed model, with a minor decrease in BLEU score of 0.27.https://doi.org/10.1371/journal.pone.0288060
spellingShingle	Hyojin Jeon Seungcheol Park Jin-Gee Kim U Kang PET: Parameter-efficient Knowledge Distillation on Transformer. PLoS ONE
title	PET: Parameter-efficient Knowledge Distillation on Transformer.
title_full	PET: Parameter-efficient Knowledge Distillation on Transformer.
title_fullStr	PET: Parameter-efficient Knowledge Distillation on Transformer.
title_full_unstemmed	PET: Parameter-efficient Knowledge Distillation on Transformer.
title_short	PET: Parameter-efficient Knowledge Distillation on Transformer.
title_sort	pet parameter efficient knowledge distillation on transformer
url	https://doi.org/10.1371/journal.pone.0288060
work_keys_str_mv	AT hyojinjeon petparameterefficientknowledgedistillationontransformer AT seungcheolpark petparameterefficientknowledgedistillationontransformer AT jingeekim petparameterefficientknowledgedistillationontransformer AT ukang petparameterefficientknowledgedistillationontransformer

PET: Parameter-efficient Knowledge Distillation on Transformer.

Similar Items