Threatening URDU Language Detection from Tweets Using Machine Learning

Technology’s expansion has contributed to the rise in popularity of social media platforms. Twitter is one of the leading social media platforms that people use to share their opinions. Such opinions, sometimes, may contain threatening text, deliberately or non-deliberately, which can be disturbing...

Full description

Bibliographic Details
Main Authors: Aneela Mehmood, Muhammad Shoaib Farooq, Ansar Naseem, Furqan Rustam, Mónica Gracia Villar, Carmen Lili Rodríguez, Imran Ashraf
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/20/10342
_version_ 1797475620563714048
author Aneela Mehmood
Muhammad Shoaib Farooq
Ansar Naseem
Furqan Rustam
Mónica Gracia Villar
Carmen Lili Rodríguez
Imran Ashraf
author_facet Aneela Mehmood
Muhammad Shoaib Farooq
Ansar Naseem
Furqan Rustam
Mónica Gracia Villar
Carmen Lili Rodríguez
Imran Ashraf
author_sort Aneela Mehmood
collection DOAJ
description Technology’s expansion has contributed to the rise in popularity of social media platforms. Twitter is one of the leading social media platforms that people use to share their opinions. Such opinions, sometimes, may contain threatening text, deliberately or non-deliberately, which can be disturbing for other users. Consequently, the detection of threatening content on social media is an important task. Contrary to high-resource languages like English, Dutch, and others that have several such approaches, the low-resource Urdu language does not have such a luxury. Therefore, this study presents an intelligent threatening language detection for the Urdu language. A stacking model is proposed that uses an extra tree (ET) classifier and Bayes theorem-based Bernoulli Naive Bayes (BNB) as the based learners while logistic regression (LR) is employed as the meta learner. A performance analysis is carried out by deploying a support vector classifier, ET, LR, BNB, fully connected network, convolutional neural network, long short-term memory, and gated recurrent unit. Experimental results indicate that the stacked model performs better than both machine learning and deep learning models. With 74.01% accuracy, 70.84% precision, 75.65% recall, and 73.99% F1 score, the model outperforms the existing benchmark study.
first_indexed 2024-03-09T20:47:44Z
format Article
id doaj.art-7ea93ad47a154bbb83d93a92500a2500
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T20:47:44Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-7ea93ad47a154bbb83d93a92500a25002023-11-23T22:43:06ZengMDPI AGApplied Sciences2076-34172022-10-0112201034210.3390/app122010342Threatening URDU Language Detection from Tweets Using Machine LearningAneela Mehmood0Muhammad Shoaib Farooq1Ansar Naseem2Furqan Rustam3Mónica Gracia Villar4Carmen Lili Rodríguez5Imran Ashraf6Department of Computer Science, University of Management and Technology, Lahore 54000, PakistanDepartment of Computer Science, University of Management and Technology, Lahore 54000, PakistanDepartment of Computer Science, University of Management and Technology, Lahore 54000, PakistanSchool of Computer Science, University College Dublin, D04 V1W8 Dublin, IrelandFaculty of Social Science and Humanities, Universidad Europea del Atlántico, Isabel Torres 21, 39011 Santander, SpainFaculty of Social Science and Humanities, Universidad Europea del Atlántico, Isabel Torres 21, 39011 Santander, SpainDepartment of Information and Communication Engineering, Yeungnam University, Gyeongsan 38541, KoreaTechnology’s expansion has contributed to the rise in popularity of social media platforms. Twitter is one of the leading social media platforms that people use to share their opinions. Such opinions, sometimes, may contain threatening text, deliberately or non-deliberately, which can be disturbing for other users. Consequently, the detection of threatening content on social media is an important task. Contrary to high-resource languages like English, Dutch, and others that have several such approaches, the low-resource Urdu language does not have such a luxury. Therefore, this study presents an intelligent threatening language detection for the Urdu language. A stacking model is proposed that uses an extra tree (ET) classifier and Bayes theorem-based Bernoulli Naive Bayes (BNB) as the based learners while logistic regression (LR) is employed as the meta learner. A performance analysis is carried out by deploying a support vector classifier, ET, LR, BNB, fully connected network, convolutional neural network, long short-term memory, and gated recurrent unit. Experimental results indicate that the stacked model performs better than both machine learning and deep learning models. With 74.01% accuracy, 70.84% precision, 75.65% recall, and 73.99% F1 score, the model outperforms the existing benchmark study.https://www.mdpi.com/2076-3417/12/20/10342threatening language detectionUrdu text classificationmachine learningstacking
spellingShingle Aneela Mehmood
Muhammad Shoaib Farooq
Ansar Naseem
Furqan Rustam
Mónica Gracia Villar
Carmen Lili Rodríguez
Imran Ashraf
Threatening URDU Language Detection from Tweets Using Machine Learning
Applied Sciences
threatening language detection
Urdu text classification
machine learning
stacking
title Threatening URDU Language Detection from Tweets Using Machine Learning
title_full Threatening URDU Language Detection from Tweets Using Machine Learning
title_fullStr Threatening URDU Language Detection from Tweets Using Machine Learning
title_full_unstemmed Threatening URDU Language Detection from Tweets Using Machine Learning
title_short Threatening URDU Language Detection from Tweets Using Machine Learning
title_sort threatening urdu language detection from tweets using machine learning
topic threatening language detection
Urdu text classification
machine learning
stacking
url https://www.mdpi.com/2076-3417/12/20/10342
work_keys_str_mv AT aneelamehmood threateningurdulanguagedetectionfromtweetsusingmachinelearning
AT muhammadshoaibfarooq threateningurdulanguagedetectionfromtweetsusingmachinelearning
AT ansarnaseem threateningurdulanguagedetectionfromtweetsusingmachinelearning
AT furqanrustam threateningurdulanguagedetectionfromtweetsusingmachinelearning
AT monicagraciavillar threateningurdulanguagedetectionfromtweetsusingmachinelearning
AT carmenlilirodriguez threateningurdulanguagedetectionfromtweetsusingmachinelearning
AT imranashraf threateningurdulanguagedetectionfromtweetsusingmachinelearning