A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM
Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-03-01
|
Series: | Electronics |
Subjects: | |
Online Access: | https://www.mdpi.com/2079-9292/12/7/1531 |
_version_ | 1797608077559595008 |
---|---|
author | Li Pan Wei Hong Lim Yong Gan |
author_facet | Li Pan Wei Hong Lim Yong Gan |
author_sort | Li Pan |
collection | DOAJ |
description | Considering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, <i>F</i>1 value, <i>Ma_F</i> and <i>Mi_F</i> are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms. |
first_indexed | 2024-03-11T05:39:37Z |
format | Article |
id | doaj.art-30035e98176a48289457d1be782c4e61 |
institution | Directory Open Access Journal |
issn | 2079-9292 |
language | English |
last_indexed | 2024-03-11T05:39:37Z |
publishDate | 2023-03-01 |
publisher | MDPI AG |
record_format | Article |
series | Electronics |
spelling | doaj.art-30035e98176a48289457d1be782c4e612023-11-17T16:31:55ZengMDPI AGElectronics2079-92922023-03-01127153110.3390/electronics12071531A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAMLi Pan0Wei Hong Lim1Yong Gan2Zhengzhou Institute of Engineering and Technology, Zhenzhou 450044, ChinaFaculty of Engineering, Technology and Built Environment, UCSI University, Cheras, Kuala Lumpur 56000, MalaysiaZhengzhou Institute of Engineering and Technology, Zhenzhou 450044, ChinaConsidering the low accuracy of current short text classification (TC) methods and the difficulties they have with effective emotion prediction, a sustainable short TC (S-TC) method using deep learning (DL) in big data environments is proposed. First, the text is vectorized by introducing a BERT pre-training model. When processing language tasks, the TC accuracy is improved by removing a word from the text and using the information from previous words and the next words to predict. Then, a convolutional attention mechanism (CAM) model is proposed using a convolutional neural network (CNN) to capture feature interactions in the time dimension and using multiple convolutional kernels to obtain more comprehensive feature information. CAM can improve TC accuracy. Finally, by optimizing and merging bidirectional encoder representation from the transformers (BERT) pre-training model and CAM model, a corresponding BERT-CAM classification model for S-TC is proposed. Through simulation experiments, the proposed S-TC method and the other three methods are compared and analyzed using three datasets. The results show that the accuracy, precision, recall, <i>F</i>1 value, <i>Ma_F</i> and <i>Mi_F</i> are the largest, reaching 94.28%, 86.36%, 84.95%, 85.96%, 86.34% and 86.56, respectively. The algorithm’s performance is better than that of the other three comparison algorithms.https://www.mdpi.com/2079-9292/12/7/1531S-TCsustainableBERTCAMDLbig data |
spellingShingle | Li Pan Wei Hong Lim Yong Gan A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM Electronics S-TC sustainable BERT CAM DL big data |
title | A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM |
title_full | A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM |
title_fullStr | A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM |
title_full_unstemmed | A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM |
title_short | A Method of Sustainable Development for Three Chinese Short-Text Datasets Based on BERT-CAM |
title_sort | method of sustainable development for three chinese short text datasets based on bert cam |
topic | S-TC sustainable BERT CAM DL big data |
url | https://www.mdpi.com/2079-9292/12/7/1531 |
work_keys_str_mv | AT lipan amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT weihonglim amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT yonggan amethodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT lipan methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT weihonglim methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam AT yonggan methodofsustainabledevelopmentforthreechineseshorttextdatasetsbasedonbertcam |