A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM

Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre...

Full description

Bibliographic Details
Main Authors:	Li Pan, Wei Hong Lim, Yong Gan
Format:	Article
Language:	English
Published:	MDPI AG 2023-03-01
Series:	Electronics
Subjects:	S-TC sustainable BERT CAM DL big data
Online Access:	https://www.mdpi.com/2079-9292/12/7/1531

_version_	1797608077559595008
author	Li Pan Wei Hong Lim Yong Gan
author_facet	Li Pan Wei Hong Lim Yong Gan
author_sort	Li Pan
collection	DOAJ
description	Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, <i>F</i>1 value, <i>Ma_F</i> and <i>Mi_F</i> are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.
first_indexed	2024-03-11T05:39:37Z
format	Article
id	doaj.art-30035e98176a48289457d1be782c4e61
institution	Directory Open Access Journal
issn	2079-9292
language	English
last_indexed	2024-03-11T05:39:37Z
publishDate	2023-03-01
publisher	MDPI AG
record_format	Article
series	Electronics
spelling	doaj.art-30035e98176a48289457d1be782c4e612023-11-17T16:31:55ZengMDPI AGElectronics2079-92922023-03-01127153110.3390/electronics12071531A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAMLi Pan0Wei Hong Lim1Yong Gan2Zhengzhou Institute of Engineering and Technology, Zhenzhou 450044, ChinaFaculty of Engineering, Technology and Built Environment, UCSI University, Cheras, Kuala Lumpur 56000, MalaysiaZhengzhou Institute of Engineering and Technology, Zhenzhou 450044, ChinaConsidering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, <i>F</i>1 value, <i>Ma_F</i> and <i>Mi_F</i> are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.https://www.mdpi.com/2079-9292/12/7/1531S-TCsustainableBERTCAMDLbig data
spellingShingle	Li Pan Wei Hong Lim Yong Gan A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM Electronics S-TC sustainable BERT CAM DL big data
title	A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_full	A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_fullStr	A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_full_unstemmed	A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_short	A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_sort	method of sustainable development for three chinese short text datasets based on bert cam
topic	S-TC sustainable BERT CAM DL big data
url	https://www.mdpi.com/2079-9292/12/7/1531
work_keys_str_mv	AT lipan amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT weihonglim amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT yonggan amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT lipan methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT weihonglim methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT yonggan methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam

A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM

Similar Items