Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model

Hope Speech Detection (HSD) from social media is a new direction for promoting and supporting positive content to encourage harmony and positivity in society. As users of social media belong to different linguistic communities, hope speech detection is rarely studied as a multilingual task consideri...

Full description

Bibliographic Details
Main Authors: Muhammad Shahid Iqbal Malik, Anna Nazarova, Mona Mamdouh Jamjoom, Dmitry I. Ignatov
Format: Article
Language:English
Published: Elsevier 2023-09-01
Series:Journal of King Saud University: Computer and Information Sciences
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1319157823002902
_version_ 1797663862062841856
author Muhammad Shahid Iqbal Malik
Anna Nazarova
Mona Mamdouh Jamjoom
Dmitry I. Ignatov
author_facet Muhammad Shahid Iqbal Malik
Anna Nazarova
Mona Mamdouh Jamjoom
Dmitry I. Ignatov
author_sort Muhammad Shahid Iqbal Malik
collection DOAJ
description Hope Speech Detection (HSD) from social media is a new direction for promoting and supporting positive content to encourage harmony and positivity in society. As users of social media belong to different linguistic communities, hope speech detection is rarely studied as a multilingual task considering low-resource languages. Moreover, prior studies explored only monolingual techniques, and the Russian language is not addressed. This study tackles the issue of Multi-lingual Hope Speech Detection (MHSD) in English and Russian languages using the transfer learning paradigm with fine-tuning approach. We explore joint multi-lingual and translation-based approaches to tackle the task of multilingualism, where the latter approach adopts the translation mechanism to transform all content into one language and then classify them. The joint multi-lingual method handles it by designing a universal classifier for various languages. We explore the strengths of the Robustly Optimized BERT Pre-Training Approach (RoBERTa) that showed a benchmark in capturing the semantics and contextual information within the content. The proposed framework consists of several stages: 1) data preprocessing, 2) representation of data using RoBERTa models, 3) fine-tuning phase, and 4) classification of hope speech into two labels. A new Russian corpus for hope speech detection is built, containing YouTube comments. Several experiments are conducted in English and Russian languages by using semi-supervised bilingual English and Russian datasets. The findings show that the proposed framework demonstrated benchmark performance and outperformed the baselines. Furthermore, the translation-based approach (Russian-RoBERTa) offered the best performance by achieving 94% accuracy and 80.24% f1-score.
first_indexed 2024-03-11T19:20:55Z
format Article
id doaj.art-487cb3a4f9764839903e257f33e93980
institution Directory Open Access Journal
issn 1319-1578
language English
last_indexed 2024-03-11T19:20:55Z
publishDate 2023-09-01
publisher Elsevier
record_format Article
series Journal of King Saud University: Computer and Information Sciences
spelling doaj.art-487cb3a4f9764839903e257f33e939802023-10-07T04:34:12ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-09-01358101736Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa modelMuhammad Shahid Iqbal Malik0Anna Nazarova1Mona Mamdouh Jamjoom2Dmitry I. Ignatov3Department of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow 109028, Russian Federation; Corresponding author.Department of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow 109028, Russian FederationDepartment of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi ArabiaDepartment of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow 109028, Russian FederationHope Speech Detection (HSD) from social media is a new direction for promoting and supporting positive content to encourage harmony and positivity in society. As users of social media belong to different linguistic communities, hope speech detection is rarely studied as a multilingual task considering low-resource languages. Moreover, prior studies explored only monolingual techniques, and the Russian language is not addressed. This study tackles the issue of Multi-lingual Hope Speech Detection (MHSD) in English and Russian languages using the transfer learning paradigm with fine-tuning approach. We explore joint multi-lingual and translation-based approaches to tackle the task of multilingualism, where the latter approach adopts the translation mechanism to transform all content into one language and then classify them. The joint multi-lingual method handles it by designing a universal classifier for various languages. We explore the strengths of the Robustly Optimized BERT Pre-Training Approach (RoBERTa) that showed a benchmark in capturing the semantics and contextual information within the content. The proposed framework consists of several stages: 1) data preprocessing, 2) representation of data using RoBERTa models, 3) fine-tuning phase, and 4) classification of hope speech into two labels. A new Russian corpus for hope speech detection is built, containing YouTube comments. Several experiments are conducted in English and Russian languages by using semi-supervised bilingual English and Russian datasets. The findings show that the proposed framework demonstrated benchmark performance and outperformed the baselines. Furthermore, the translation-based approach (Russian-RoBERTa) offered the best performance by achieving 94% accuracy and 80.24% f1-score.http://www.sciencedirect.com/science/article/pii/S1319157823002902Transfer learningRussianXLM-RoBERTaHope speechTranslation-basedMulti-lingual
spellingShingle Muhammad Shahid Iqbal Malik
Anna Nazarova
Mona Mamdouh Jamjoom
Dmitry I. Ignatov
Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model
Journal of King Saud University: Computer and Information Sciences
Transfer learning
Russian
XLM-RoBERTa
Hope speech
Translation-based
Multi-lingual
title Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model
title_full Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model
title_fullStr Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model
title_full_unstemmed Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model
title_short Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model
title_sort multilingual hope speech detection a robust framework using transfer learning of fine tuning roberta model
topic Transfer learning
Russian
XLM-RoBERTa
Hope speech
Translation-based
Multi-lingual
url http://www.sciencedirect.com/science/article/pii/S1319157823002902
work_keys_str_mv AT muhammadshahidiqbalmalik multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel
AT annanazarova multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel
AT monamamdouhjamjoom multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel
AT dmitryiignatov multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel