Fake news detection in Urdu language using machine learning

With the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness...

Full description

Bibliographic Details
Main Authors:	Muhammad Shoaib Farooq, Ansar Naseem, Furqan Rustam, Imran Ashraf
Format:	Article
Language:	English
Published:	PeerJ Inc. 2023-05-01
Series:	PeerJ Computer Science
Subjects:	Fake news detection Ensemble learning Machine learning Urdu fake news
Online Access:	https://peerj.com/articles/cs-1353.pdf

_version_	1827941265940938752
author	Muhammad Shoaib Farooq Ansar Naseem Furqan Rustam Imran Ashraf
author_facet	Muhammad Shoaib Farooq Ansar Naseem Furqan Rustam Imran Ashraf
author_sort	Muhammad Shoaib Farooq
collection	DOAJ
description	With the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness for multi-domain datasets, especially within the context of Urdu news. In addition, some studies use machine-translated datasets using English to Urdu Google translator and manual verification is not carried out. This limits the wide use of such approaches for real-world applications. This study investigates these issues and proposes fake news classier for Urdu news. The dataset has been collected covering nine different domains and constitutes 4097 news. Experiments are performed using the term frequency-inverse document frequency (TF-IDF) and a bag of words (BoW) with the combination of n-grams. The major contribution of this study is the use of feature stacking, where feature vectors of preprocessed text and verbs extracted from the preprocessed text are combined. Support vector machine, k-nearest neighbor, and ensemble models like random forest (RF) and extra tree (ET) were used for bagging while stacking was applied with ET and RF as base learners with logistic regression as the meta learner. To check the robustness of models, fivefold and independent set testing were employed. Experimental results indicate that stacking achieves 93.39%, 88.96%, 96.33%, 86.2%, and 93.17% scores for accuracy, specificity, sensitivity, MCC, ROC, and F1 score, respectively.
first_indexed	2024-03-13T09:35:41Z
format	Article
id	doaj.art-a7643f8b6a114fb1a74fa57cb9d44b8d
institution	Directory Open Access Journal
issn	2376-5992
language	English
last_indexed	2024-03-13T09:35:41Z
publishDate	2023-05-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj.art-a7643f8b6a114fb1a74fa57cb9d44b8d2023-05-25T15:05:05ZengPeerJ Inc.PeerJ Computer Science2376-59922023-05-019e135310.7717/peerj-cs.1353Fake news detection in Urdu language using machine learningMuhammad Shoaib Farooq0Ansar Naseem1Furqan Rustam2Imran Ashraf3Department of Computer Science, University of Management and Technology, Lahore, PakistanDepartment of Computer Science, University of Management and Technology, Lahore, PakistanDepartment of Software Engineering, University of Management & Technology, Lahore, Lahore, PakistanInformation and Communication Engineering, Yeungnam University, Gyeongsan si, South KoreaWith the rise of social media, the dissemination of forged content and news has been on the rise. Consequently, fake news detection has emerged as an important research problem. Several approaches have been presented to discriminate fake news from real news, however, such approaches lack robustness for multi-domain datasets, especially within the context of Urdu news. In addition, some studies use machine-translated datasets using English to Urdu Google translator and manual verification is not carried out. This limits the wide use of such approaches for real-world applications. This study investigates these issues and proposes fake news classier for Urdu news. The dataset has been collected covering nine different domains and constitutes 4097 news. Experiments are performed using the term frequency-inverse document frequency (TF-IDF) and a bag of words (BoW) with the combination of n-grams. The major contribution of this study is the use of feature stacking, where feature vectors of preprocessed text and verbs extracted from the preprocessed text are combined. Support vector machine, k-nearest neighbor, and ensemble models like random forest (RF) and extra tree (ET) were used for bagging while stacking was applied with ET and RF as base learners with logistic regression as the meta learner. To check the robustness of models, fivefold and independent set testing were employed. Experimental results indicate that stacking achieves 93.39%, 88.96%, 96.33%, 86.2%, and 93.17% scores for accuracy, specificity, sensitivity, MCC, ROC, and F1 score, respectively.https://peerj.com/articles/cs-1353.pdfFake news detectionEnsemble learningMachine learningUrdu fake news
spellingShingle	Muhammad Shoaib Farooq Ansar Naseem Furqan Rustam Imran Ashraf Fake news detection in Urdu language using machine learning PeerJ Computer Science Fake news detection Ensemble learning Machine learning Urdu fake news
title	Fake news detection in Urdu language using machine learning
title_full	Fake news detection in Urdu language using machine learning
title_fullStr	Fake news detection in Urdu language using machine learning
title_full_unstemmed	Fake news detection in Urdu language using machine learning
title_short	Fake news detection in Urdu language using machine learning
title_sort	fake news detection in urdu language using machine learning
topic	Fake news detection Ensemble learning Machine learning Urdu fake news
url	https://peerj.com/articles/cs-1353.pdf
work_keys_str_mv	AT muhammadshoaibfarooq fakenewsdetectioninurdulanguageusingmachinelearning AT ansarnaseem fakenewsdetectioninurdulanguageusingmachinelearning AT furqanrustam fakenewsdetectioninurdulanguageusingmachinelearning AT imranashraf fakenewsdetectioninurdulanguageusingmachinelearning

Fake news detection in Urdu language using machine learning

Similar Items