Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-11-01
|
Series: | Big Data and Cognitive Computing |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-2289/7/4/175 |
_version_ | 1797382028827557888 |
---|---|
author | Deptii Chaudhari Ambika Vishal Pawar |
author_facet | Deptii Chaudhari Ambika Vishal Pawar |
author_sort | Deptii Chaudhari |
collection | DOAJ |
description | Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification. |
first_indexed | 2024-03-08T20:59:35Z |
format | Article |
id | doaj.art-4c9d4e6c25284cc69a5216a8be632bef |
institution | Directory Open Access Journal |
issn | 2504-2289 |
language | English |
last_indexed | 2024-03-08T20:59:35Z |
publishDate | 2023-11-01 |
publisher | MDPI AG |
record_format | Article |
series | Big Data and Cognitive Computing |
spelling | doaj.art-4c9d4e6c25284cc69a5216a8be632bef2023-12-22T13:53:33ZengMDPI AGBig Data and Cognitive Computing2504-22892023-11-017417510.3390/bdcc7040175Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News ArticlesDeptii Chaudhari0Ambika Vishal Pawar1Symbiosis Institute of Technology, Symbiosis International (Deemed University), Lavale, Pune 412115, IndiaSymbiosis Institute of Technology, Symbiosis International (Deemed University), Lavale, Pune 412115, IndiaMisinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification.https://www.mdpi.com/2504-2289/7/4/175propaganda classificationHindi text processingnews article analysisnatural language processingdeep learninglow-resource language processing |
spellingShingle | Deptii Chaudhari Ambika Vishal Pawar Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles Big Data and Cognitive Computing propaganda classification Hindi text processing news article analysis natural language processing deep learning low-resource language processing |
title | Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles |
title_full | Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles |
title_fullStr | Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles |
title_full_unstemmed | Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles |
title_short | Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles |
title_sort | empowering propaganda detection in resource restraint languages a transformer based framework for classifying hindi news articles |
topic | propaganda classification Hindi text processing news article analysis natural language processing deep learning low-resource language processing |
url | https://www.mdpi.com/2504-2289/7/4/175 |
work_keys_str_mv | AT deptiichaudhari empoweringpropagandadetectioninresourcerestraintlanguagesatransformerbasedframeworkforclassifyinghindinewsarticles AT ambikavishalpawar empoweringpropagandadetectioninresourcerestraintlanguagesatransformerbasedframeworkforclassifyinghindinewsarticles |