Medical text classification based on the discriminative pre-training model and prompt-tuning

Medical text classification, as a fundamental medical natural language processing task, aims to identify the categories to which a short medical text belongs. Current research has focused on performing the medical text classification task using a pre-training language model through fine-tuning. Howe...

Full description

Bibliographic Details
Main Authors:	Yu Wang, Yuan Wang, Zhenwan Peng, Feifan Zhang, Luyao Zhou, Fei Yang
Format:	Article
Language:	English
Published:	SAGE Publishing 2023-08-01
Series:	Digital Health
Online Access:	https://doi.org/10.1177/20552076231193213

_version_	1797752231756300288
author	Yu Wang Yuan Wang Zhenwan Peng Feifan Zhang Luyao Zhou Fei Yang
author_facet	Yu Wang Yuan Wang Zhenwan Peng Feifan Zhang Luyao Zhou Fei Yang
author_sort	Yu Wang
collection	DOAJ
description	Medical text classification, as a fundamental medical natural language processing task, aims to identify the categories to which a short medical text belongs. Current research has focused on performing the medical text classification task using a pre-training language model through fine-tuning. However, this paradigm introduces additional parameters when training extra classifiers. Recent studies have shown that the “prompt-tuning” paradigm induces better performance in many natural language processing tasks because it bridges the gap between pre-training goals and downstream tasks. The main idea of prompt-tuning is to transform binary or multi-classification tasks into mask prediction tasks by fully exploiting the features learned by pre-training language models. This study explores, for the first time, how to classify medical texts using a discriminative pre-training language model called ERNIE-Health through prompt-tuning. Specifically, we attempt to perform prompt-tuning based on the multi-token selection task, which is a pre-training task of ERNIE-Health. The raw text is wrapped into a new sequence with a template in which the category label is replaced by a [UNK] token. The model is then trained to calculate the probability distribution of the candidate categories. Our method is tested on the KUAKE-Question Intention Classification and CHiP-Clinical Trial Criterion datasets and obtains the accuracy values of 0.866 and 0.861. In addition, the loss values of our model decrease faster throughout the training period compared to the fine-tuning. The experimental results provide valuable insights to the community and suggest that prompt-tuning can be a promising approach to improve the performance of pre-training models in domain-specific tasks.
first_indexed	2024-03-12T17:00:14Z
format	Article
id	doaj.art-dcb8c11f73704378a57fb91c10544b2b
institution	Directory Open Access Journal
issn	2055-2076
language	English
last_indexed	2024-03-12T17:00:14Z
publishDate	2023-08-01
publisher	SAGE Publishing
record_format	Article
series	Digital Health
spelling	doaj.art-dcb8c11f73704378a57fb91c10544b2b2023-08-07T17:33:38ZengSAGE PublishingDigital Health2055-20762023-08-01910.1177/20552076231193213Medical text classification based on the discriminative pre-training model and prompt-tuningYu Wang0Yuan Wang1Zhenwan Peng2Feifan Zhang3Luyao Zhou4Fei Yang5 School of Biomedical Engineering, , Hefei, China , Hefei, China School of Biomedical Engineering, , Hefei, China School of Biomedical Engineering, , Hefei, China School of Biomedical Engineering, , Hefei, China School of Biomedical Engineering, , Hefei, ChinaMedical text classification, as a fundamental medical natural language processing task, aims to identify the categories to which a short medical text belongs. Current research has focused on performing the medical text classification task using a pre-training language model through fine-tuning. However, this paradigm introduces additional parameters when training extra classifiers. Recent studies have shown that the “prompt-tuning” paradigm induces better performance in many natural language processing tasks because it bridges the gap between pre-training goals and downstream tasks. The main idea of prompt-tuning is to transform binary or multi-classification tasks into mask prediction tasks by fully exploiting the features learned by pre-training language models. This study explores, for the first time, how to classify medical texts using a discriminative pre-training language model called ERNIE-Health through prompt-tuning. Specifically, we attempt to perform prompt-tuning based on the multi-token selection task, which is a pre-training task of ERNIE-Health. The raw text is wrapped into a new sequence with a template in which the category label is replaced by a [UNK] token. The model is then trained to calculate the probability distribution of the candidate categories. Our method is tested on the KUAKE-Question Intention Classification and CHiP-Clinical Trial Criterion datasets and obtains the accuracy values of 0.866 and 0.861. In addition, the loss values of our model decrease faster throughout the training period compared to the fine-tuning. The experimental results provide valuable insights to the community and suggest that prompt-tuning can be a promising approach to improve the performance of pre-training models in domain-specific tasks.https://doi.org/10.1177/20552076231193213
spellingShingle	Yu Wang Yuan Wang Zhenwan Peng Feifan Zhang Luyao Zhou Fei Yang Medical text classification based on the discriminative pre-training model and prompt-tuning Digital Health
title	Medical text classification based on the discriminative pre-training model and prompt-tuning
title_full	Medical text classification based on the discriminative pre-training model and prompt-tuning
title_fullStr	Medical text classification based on the discriminative pre-training model and prompt-tuning
title_full_unstemmed	Medical text classification based on the discriminative pre-training model and prompt-tuning
title_short	Medical text classification based on the discriminative pre-training model and prompt-tuning
title_sort	medical text classification based on the discriminative pre training model and prompt tuning
url	https://doi.org/10.1177/20552076231193213
work_keys_str_mv	AT yuwang medicaltextclassificationbasedonthediscriminativepretrainingmodelandprompttuning AT yuanwang medicaltextclassificationbasedonthediscriminativepretrainingmodelandprompttuning AT zhenwanpeng medicaltextclassificationbasedonthediscriminativepretrainingmodelandprompttuning AT feifanzhang medicaltextclassificationbasedonthediscriminativepretrainingmodelandprompttuning AT luyaozhou medicaltextclassificationbasedonthediscriminativepretrainingmodelandprompttuning AT feiyang medicaltextclassificationbasedonthediscriminativepretrainingmodelandprompttuning

Medical text classification based on the discriminative pre-training model and prompt-tuning

Similar Items