Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model
Hope Speech Detection (HSD) from social media is a new direction for promoting and supporting positive content to encourage harmony and positivity in society. As users of social media belong to different linguistic communities, hope speech detection is rarely studied as a multilingual task consideri...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2023-09-01
|
Series: | Journal of King Saud University: Computer and Information Sciences |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1319157823002902 |
_version_ | 1797663862062841856 |
---|---|
author | Muhammad Shahid Iqbal Malik Anna Nazarova Mona Mamdouh Jamjoom Dmitry I. Ignatov |
author_facet | Muhammad Shahid Iqbal Malik Anna Nazarova Mona Mamdouh Jamjoom Dmitry I. Ignatov |
author_sort | Muhammad Shahid Iqbal Malik |
collection | DOAJ |
description | Hope Speech Detection (HSD) from social media is a new direction for promoting and supporting positive content to encourage harmony and positivity in society. As users of social media belong to different linguistic communities, hope speech detection is rarely studied as a multilingual task considering low-resource languages. Moreover, prior studies explored only monolingual techniques, and the Russian language is not addressed. This study tackles the issue of Multi-lingual Hope Speech Detection (MHSD) in English and Russian languages using the transfer learning paradigm with fine-tuning approach. We explore joint multi-lingual and translation-based approaches to tackle the task of multilingualism, where the latter approach adopts the translation mechanism to transform all content into one language and then classify them. The joint multi-lingual method handles it by designing a universal classifier for various languages. We explore the strengths of the Robustly Optimized BERT Pre-Training Approach (RoBERTa) that showed a benchmark in capturing the semantics and contextual information within the content. The proposed framework consists of several stages: 1) data preprocessing, 2) representation of data using RoBERTa models, 3) fine-tuning phase, and 4) classification of hope speech into two labels. A new Russian corpus for hope speech detection is built, containing YouTube comments. Several experiments are conducted in English and Russian languages by using semi-supervised bilingual English and Russian datasets. The findings show that the proposed framework demonstrated benchmark performance and outperformed the baselines. Furthermore, the translation-based approach (Russian-RoBERTa) offered the best performance by achieving 94% accuracy and 80.24% f1-score. |
first_indexed | 2024-03-11T19:20:55Z |
format | Article |
id | doaj.art-487cb3a4f9764839903e257f33e93980 |
institution | Directory Open Access Journal |
issn | 1319-1578 |
language | English |
last_indexed | 2024-03-11T19:20:55Z |
publishDate | 2023-09-01 |
publisher | Elsevier |
record_format | Article |
series | Journal of King Saud University: Computer and Information Sciences |
spelling | doaj.art-487cb3a4f9764839903e257f33e939802023-10-07T04:34:12ZengElsevierJournal of King Saud University: Computer and Information Sciences1319-15782023-09-01358101736Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa modelMuhammad Shahid Iqbal Malik0Anna Nazarova1Mona Mamdouh Jamjoom2Dmitry I. Ignatov3Department of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow 109028, Russian Federation; Corresponding author.Department of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow 109028, Russian FederationDepartment of Computer Sciences, College of Computer and Information Sciences, Princess Nourah bint Abdulrahman University, Riyadh, Saudi ArabiaDepartment of Computer Science, National Research University Higher School of Economics, 11 Pokrovskiy Boulevard, Moscow 109028, Russian FederationHope Speech Detection (HSD) from social media is a new direction for promoting and supporting positive content to encourage harmony and positivity in society. As users of social media belong to different linguistic communities, hope speech detection is rarely studied as a multilingual task considering low-resource languages. Moreover, prior studies explored only monolingual techniques, and the Russian language is not addressed. This study tackles the issue of Multi-lingual Hope Speech Detection (MHSD) in English and Russian languages using the transfer learning paradigm with fine-tuning approach. We explore joint multi-lingual and translation-based approaches to tackle the task of multilingualism, where the latter approach adopts the translation mechanism to transform all content into one language and then classify them. The joint multi-lingual method handles it by designing a universal classifier for various languages. We explore the strengths of the Robustly Optimized BERT Pre-Training Approach (RoBERTa) that showed a benchmark in capturing the semantics and contextual information within the content. The proposed framework consists of several stages: 1) data preprocessing, 2) representation of data using RoBERTa models, 3) fine-tuning phase, and 4) classification of hope speech into two labels. A new Russian corpus for hope speech detection is built, containing YouTube comments. Several experiments are conducted in English and Russian languages by using semi-supervised bilingual English and Russian datasets. The findings show that the proposed framework demonstrated benchmark performance and outperformed the baselines. Furthermore, the translation-based approach (Russian-RoBERTa) offered the best performance by achieving 94% accuracy and 80.24% f1-score.http://www.sciencedirect.com/science/article/pii/S1319157823002902Transfer learningRussianXLM-RoBERTaHope speechTranslation-basedMulti-lingual |
spellingShingle | Muhammad Shahid Iqbal Malik Anna Nazarova Mona Mamdouh Jamjoom Dmitry I. Ignatov Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model Journal of King Saud University: Computer and Information Sciences Transfer learning Russian XLM-RoBERTa Hope speech Translation-based Multi-lingual |
title | Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model |
title_full | Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model |
title_fullStr | Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model |
title_full_unstemmed | Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model |
title_short | Multilingual hope speech detection: A Robust framework using transfer learning of fine-tuning RoBERTa model |
title_sort | multilingual hope speech detection a robust framework using transfer learning of fine tuning roberta model |
topic | Transfer learning Russian XLM-RoBERTa Hope speech Translation-based Multi-lingual |
url | http://www.sciencedirect.com/science/article/pii/S1319157823002902 |
work_keys_str_mv | AT muhammadshahidiqbalmalik multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel AT annanazarova multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel AT monamamdouhjamjoom multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel AT dmitryiignatov multilingualhopespeechdetectionarobustframeworkusingtransferlearningoffinetuningrobertamodel |