Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles

Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda...

Full description

Bibliographic Details
Main Authors: Deptii Chaudhari, Ambika Vishal Pawar
Format: Article
Language:English
Published: MDPI AG 2023-11-01
Series:Big Data and Cognitive Computing
Subjects:
Online Access:https://www.mdpi.com/2504-2289/7/4/175
_version_ 1797382028827557888
author Deptii Chaudhari
Ambika Vishal Pawar
author_facet Deptii Chaudhari
Ambika Vishal Pawar
author_sort Deptii Chaudhari
collection DOAJ
description Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification.
first_indexed 2024-03-08T20:59:35Z
format Article
id doaj.art-4c9d4e6c25284cc69a5216a8be632bef
institution Directory Open Access Journal
issn 2504-2289
language English
last_indexed 2024-03-08T20:59:35Z
publishDate 2023-11-01
publisher MDPI AG
record_format Article
series Big Data and Cognitive Computing
spelling doaj.art-4c9d4e6c25284cc69a5216a8be632bef2023-12-22T13:53:33ZengMDPI AGBig Data and Cognitive Computing2504-22892023-11-017417510.3390/bdcc7040175Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News ArticlesDeptii Chaudhari0Ambika Vishal Pawar1Symbiosis Institute of Technology, Symbiosis International (Deemed University), Lavale, Pune 412115, IndiaSymbiosis Institute of Technology, Symbiosis International (Deemed University), Lavale, Pune 412115, IndiaMisinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification.https://www.mdpi.com/2504-2289/7/4/175propaganda classificationHindi text processingnews article analysisnatural language processingdeep learninglow-resource language processing
spellingShingle Deptii Chaudhari
Ambika Vishal Pawar
Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
Big Data and Cognitive Computing
propaganda classification
Hindi text processing
news article analysis
natural language processing
deep learning
low-resource language processing
title Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_full Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_fullStr Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_full_unstemmed Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_short Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_sort empowering propaganda detection in resource restraint languages a transformer based framework for classifying hindi news articles
topic propaganda classification
Hindi text processing
news article analysis
natural language processing
deep learning
low-resource language processing
url https://www.mdpi.com/2504-2289/7/4/175
work_keys_str_mv AT deptiichaudhari empoweringpropagandadetectioninresourcerestraintlanguagesatransformerbasedframeworkforclassifyinghindinewsarticles
AT ambikavishalpawar empoweringpropagandadetectioninresourcerestraintlanguagesatransformerbasedframeworkforclassifyinghindinewsarticles