Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles

Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda...

Full description

Bibliographic Details
Main Authors:	Deptii Chaudhari, Ambika Vishal Pawar
Format:	Article
Language:	English
Published:	MDPI AG 2023-11-01
Series:	Big Data and Cognitive Computing
Subjects:	propaganda classification Hindi text processing news article analysis natural language processing deep learning low-resource language processing
Online Access:	https://www.mdpi.com/2504-2289/7/4/175

_version_	1797382028827557888
author	Deptii Chaudhari Ambika Vishal Pawar
author_facet	Deptii Chaudhari Ambika Vishal Pawar
author_sort	Deptii Chaudhari
collection	DOAJ
description	Misinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification.
first_indexed	2024-03-08T20:59:35Z
format	Article
id	doaj.art-4c9d4e6c25284cc69a5216a8be632bef
institution	Directory Open Access Journal
issn	2504-2289
language	English
last_indexed	2024-03-08T20:59:35Z
publishDate	2023-11-01
publisher	MDPI AG
record_format	Article
series	Big Data and Cognitive Computing
spelling	doaj.art-4c9d4e6c25284cc69a5216a8be632bef2023-12-22T13:53:33ZengMDPI AGBig Data and Cognitive Computing2504-22892023-11-017417510.3390/bdcc7040175Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News ArticlesDeptii Chaudhari0Ambika Vishal Pawar1Symbiosis Institute of Technology, Symbiosis International (Deemed University), Lavale, Pune 412115, IndiaSymbiosis Institute of Technology, Symbiosis International (Deemed University), Lavale, Pune 412115, IndiaMisinformation, fake news, and various propaganda techniques are increasingly used in digital media. It becomes challenging to uncover propaganda as it works with the systematic goal of influencing other individuals for the determined ends. While significant research has been reported on propaganda identification and classification in resource-rich languages such as English, much less effort has been made in resource-deprived languages like Hindi. The spread of propaganda in the Hindi news media has induced our attempt to devise an approach for the propaganda categorization of Hindi news articles. The unavailability of the necessary language tools makes propaganda classification in Hindi more challenging. This study proposes the effective use of deep learning and transformer-based approaches for Hindi computational propaganda classification. To address the lack of pretrained word embeddings in Hindi, Hindi Word2vec embeddings were created using the H-Prop-News corpus for feature extraction. Subsequently, three deep learning models, i.e., CNN (convolutional neural network), LSTM (long short-term memory), Bi-LSTM (bidirectional long short-term memory); and four transformer-based models, i.e., multi-lingual BERT, Distil-BERT, Hindi-BERT, and Hindi-TPU-Electra, were experimented with. The experimental outcomes indicate that the multi-lingual BERT and Hindi-BERT models provide the best performance, with the highest F1 score of 84% on the test data. These results strongly support the efficacy of the proposed solution and indicate its appropriateness for propaganda classification.https://www.mdpi.com/2504-2289/7/4/175propaganda classificationHindi text processingnews article analysisnatural language processingdeep learninglow-resource language processing
spellingShingle	Deptii Chaudhari Ambika Vishal Pawar Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles Big Data and Cognitive Computing propaganda classification Hindi text processing news article analysis natural language processing deep learning low-resource language processing
title	Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_full	Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_fullStr	Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_full_unstemmed	Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_short	Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles
title_sort	empowering propaganda detection in resource restraint languages a transformer based framework for classifying hindi news articles
topic	propaganda classification Hindi text processing news article analysis natural language processing deep learning low-resource language processing
url	https://www.mdpi.com/2504-2289/7/4/175
work_keys_str_mv	AT deptiichaudhari empoweringpropagandadetectioninresourcerestraintlanguagesatransformerbasedframeworkforclassifyinghindinewsarticles AT ambikavishalpawar empoweringpropagandadetectioninresourcerestraintlanguagesatransformerbasedframeworkforclassifyinghindinewsarticles

Empowering Propaganda Detection in Resource-Restraint Languages: A Transformer-Based Framework for Classifying Hindi News Articles

Similar Items