Punctuation Restoration with Transformer Model on Social Media Data
Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not pr...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-01-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/13/3/1685 |
_version_ | 1797625128175009792 |
---|---|
author | Adebayo Mustapha Bakare Kalaiarasi Sonai Muthu Anbananthen Saravanan Muthaiyah Jayakumar Krishnan Subarmaniam Kannan |
author_facet | Adebayo Mustapha Bakare Kalaiarasi Sonai Muthu Anbananthen Saravanan Muthaiyah Jayakumar Krishnan Subarmaniam Kannan |
author_sort | Adebayo Mustapha Bakare |
collection | DOAJ |
description | Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used. |
first_indexed | 2024-03-11T09:52:24Z |
format | Article |
id | doaj.art-bf86c4a156fb469fac858441111e5961 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-11T09:52:24Z |
publishDate | 2023-01-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-bf86c4a156fb469fac858441111e59612023-11-16T16:09:00ZengMDPI AGApplied Sciences2076-34172023-01-01133168510.3390/app13031685Punctuation Restoration with Transformer Model on Social Media DataAdebayo Mustapha Bakare0Kalaiarasi Sonai Muthu Anbananthen1Saravanan Muthaiyah2Jayakumar Krishnan3Subarmaniam Kannan4Faculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaFaculty of Management, Multimedia University, Cyberjaya 63100, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaSeveral key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.https://www.mdpi.com/2076-3417/13/3/1685punctuation restorationtransformers modelsBidirectional Encoder Representations from Transformers (BERT)long short-term memory (LSTM) |
spellingShingle | Adebayo Mustapha Bakare Kalaiarasi Sonai Muthu Anbananthen Saravanan Muthaiyah Jayakumar Krishnan Subarmaniam Kannan Punctuation Restoration with Transformer Model on Social Media Data Applied Sciences punctuation restoration transformers models Bidirectional Encoder Representations from Transformers (BERT) long short-term memory (LSTM) |
title | Punctuation Restoration with Transformer Model on Social Media Data |
title_full | Punctuation Restoration with Transformer Model on Social Media Data |
title_fullStr | Punctuation Restoration with Transformer Model on Social Media Data |
title_full_unstemmed | Punctuation Restoration with Transformer Model on Social Media Data |
title_short | Punctuation Restoration with Transformer Model on Social Media Data |
title_sort | punctuation restoration with transformer model on social media data |
topic | punctuation restoration transformers models Bidirectional Encoder Representations from Transformers (BERT) long short-term memory (LSTM) |
url | https://www.mdpi.com/2076-3417/13/3/1685 |
work_keys_str_mv | AT adebayomustaphabakare punctuationrestorationwithtransformermodelonsocialmediadata AT kalaiarasisonaimuthuanbananthen punctuationrestorationwithtransformermodelonsocialmediadata AT saravananmuthaiyah punctuationrestorationwithtransformermodelonsocialmediadata AT jayakumarkrishnan punctuationrestorationwithtransformermodelonsocialmediadata AT subarmaniamkannan punctuationrestorationwithtransformermodelonsocialmediadata |