A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM

Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre...

Full description

Bibliographic Details
Main Authors: Li Pan, Wei Hong Lim, Yong Gan
Format: Article
Language:English
Published: MDPI AG 2023-03-01
Series:Electronics
Subjects:
Online Access:https://www.mdpi.com/2079-9292/12/7/1531
_version_ 1797608077559595008
author Li Pan
Wei Hong Lim
Yong Gan
author_facet Li Pan
Wei Hong Lim
Yong Gan
author_sort Li Pan
collection DOAJ
description Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, <i>F</i>1 value, <i>Ma_F</i> and <i>Mi_F</i> are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.
first_indexed 2024-03-11T05:39:37Z
format Article
id doaj.art-30035e98176a48289457d1be782c4e61
institution Directory Open Access Journal
issn 2079-9292
language English
last_indexed 2024-03-11T05:39:37Z
publishDate 2023-03-01
publisher MDPI AG
record_format Article
series Electronics
spelling doaj.art-30035e98176a48289457d1be782c4e612023-11-17T16:31:55ZengMDPI AGElectronics2079-92922023-03-01127153110.3390/electronics12071531A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAMLi Pan0Wei Hong Lim1Yong Gan2Zhengzhou Institute of Engineering and Technology, Zhenzhou 450044, ChinaFaculty of Engineering, Technology and Built Environment, UCSI University, Cheras, Kuala Lumpur 56000, MalaysiaZhengzhou Institute of Engineering and Technology, Zhenzhou 450044, ChinaConsidering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, <i>F</i>1 value, <i>Ma_F</i> and <i>Mi_F</i> are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.https://www.mdpi.com/2079-9292/12/7/1531S-TCsustainableBERTCAMDLbig data
spellingShingle Li Pan
Wei Hong Lim
Yong Gan
A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
Electronics
S-TC
sustainable
BERT
CAM
DL
big data
title A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_full A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_fullStr A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_full_unstemmed A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_short A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
title_sort method of sustainable development for three chinese short text datasets based on bert cam
topic S-TC
sustainable
BERT
CAM
DL
big data
url https://www.mdpi.com/2079-9292/12/7/1531
work_keys_str_mv AT lipan amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam
AT weihonglim amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam
AT yonggan amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam
AT lipan methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam
AT weihonglim methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam
AT yonggan methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam