Punctuation Restoration with Transformer Model on Social Media Data

Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not pr...

Full description

Bibliographic Details
Main Authors: Adebayo Mustapha Bakare, Kalaiarasi Sonai Muthu Anbananthen, Saravanan Muthaiyah, Jayakumar Krishnan, Subarmaniam Kannan
Format: Article
Language:English
Published: MDPI AG 2023-01-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/13/3/1685
_version_ 1797625128175009792
author Adebayo Mustapha Bakare
Kalaiarasi Sonai Muthu Anbananthen
Saravanan Muthaiyah
Jayakumar Krishnan
Subarmaniam Kannan
author_facet Adebayo Mustapha Bakare
Kalaiarasi Sonai Muthu Anbananthen
Saravanan Muthaiyah
Jayakumar Krishnan
Subarmaniam Kannan
author_sort Adebayo Mustapha Bakare
collection DOAJ
description Several key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.
first_indexed 2024-03-11T09:52:24Z
format Article
id doaj.art-bf86c4a156fb469fac858441111e5961
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-11T09:52:24Z
publishDate 2023-01-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-bf86c4a156fb469fac858441111e59612023-11-16T16:09:00ZengMDPI AGApplied Sciences2076-34172023-01-01133168510.3390/app13031685Punctuation Restoration with Transformer Model on Social Media DataAdebayo Mustapha Bakare0Kalaiarasi Sonai Muthu Anbananthen1Saravanan Muthaiyah2Jayakumar Krishnan3Subarmaniam Kannan4Faculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaFaculty of Management, Multimedia University, Cyberjaya 63100, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaFaculty of Information Science and Technology, Multimedia University, Melaka 75450, MalaysiaSeveral key challenges are faced during sentiment analysis. One major problem is determining the sentiment of complex sentences, paragraphs, and text documents. A paragraph with multiple parts might have multiple sentiment values. Predicting the overall sentiment value for this paragraph will not produce all the information necessary for businesses and brands. Therefore, a paragraph with multiple sentences should be separated into simple sentences. With a simple sentence, it will be effective to extract all the possible sentiments. Therefore, to split a paragraph, that paragraph must be properly punctuated. Most social media texts are improperly punctuated, so separating the sentences may be challenging. This study proposes a punctuation-restoration algorithm using the transformer model approach. We evaluated different Bidirectional Encoder Representations from Transformers (BERT) models for our transformer encoding, in addition to the neural network used for evaluation. Based on our evaluation, the RobertaLarge with the bidirectional long short-term memory (LSTM) provided the best accuracy of 97% and 90% for restoring the punctuation on Amazon and Telekom data, respectively. Other evaluation criteria like precision, recall, and F1-score are also used.https://www.mdpi.com/2076-3417/13/3/1685punctuation restorationtransformers modelsBidirectional Encoder Representations from Transformers (BERT)long short-term memory (LSTM)
spellingShingle Adebayo Mustapha Bakare
Kalaiarasi Sonai Muthu Anbananthen
Saravanan Muthaiyah
Jayakumar Krishnan
Subarmaniam Kannan
Punctuation Restoration with Transformer Model on Social Media Data
Applied Sciences
punctuation restoration
transformers models
Bidirectional Encoder Representations from Transformers (BERT)
long short-term memory (LSTM)
title Punctuation Restoration with Transformer Model on Social Media Data
title_full Punctuation Restoration with Transformer Model on Social Media Data
title_fullStr Punctuation Restoration with Transformer Model on Social Media Data
title_full_unstemmed Punctuation Restoration with Transformer Model on Social Media Data
title_short Punctuation Restoration with Transformer Model on Social Media Data
title_sort punctuation restoration with transformer model on social media data
topic punctuation restoration
transformers models
Bidirectional Encoder Representations from Transformers (BERT)
long short-term memory (LSTM)
url https://www.mdpi.com/2076-3417/13/3/1685
work_keys_str_mv AT adebayomustaphabakare punctuationrestorationwithtransformermodelonsocialmediadata
AT kalaiarasisonaimuthuanbananthen punctuationrestorationwithtransformermodelonsocialmediadata
AT saravananmuthaiyah punctuationrestorationwithtransformermodelonsocialmediadata
AT jayakumarkrishnan punctuationrestorationwithtransformermodelonsocialmediadata
AT subarmaniamkannan punctuationrestorationwithtransformermodelonsocialmediadata