A hybrid dependency-based approach for Urdu sentiment analysis

Abstract In the digital age, social media has emerged as a significant platform, generating a vast amount of raw data daily. This data reflects the opinions of individuals from diverse backgrounds, races, cultures, and age groups, spanning a wide range of topics. Businesses can leverage this data to...

Full description

Bibliographic Details
Main Authors: Urooba Sehar, Summrina Kanwal, Nasser I. Allheeib, Sultan Almari, Faiza Khan, Kia Dashtipur, Mandar Gogate, Osama A. Khashan
Format: Article
Language:English
Published: Nature Portfolio 2023-12-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-023-48817-8
_version_ 1797388373326823424
author Urooba Sehar
Summrina Kanwal
Nasser I. Allheeib
Sultan Almari
Faiza Khan
Kia Dashtipur
Mandar Gogate
Osama A. Khashan
author_facet Urooba Sehar
Summrina Kanwal
Nasser I. Allheeib
Sultan Almari
Faiza Khan
Kia Dashtipur
Mandar Gogate
Osama A. Khashan
author_sort Urooba Sehar
collection DOAJ
description Abstract In the digital age, social media has emerged as a significant platform, generating a vast amount of raw data daily. This data reflects the opinions of individuals from diverse backgrounds, races, cultures, and age groups, spanning a wide range of topics. Businesses can leverage this data to extract valuable insights, improve their services, and effectively reach a broader audience based on users’ expressed opinions on social media platforms. To harness the potential of this extensive and unstructured data, a deep understanding of Natural Language Processing (NLP) is crucial. Existing approaches for sentiment analysis (SA) often rely on word co-occurrence frequencies, which prove inefficient in practical scenarios. Identifying this research gap, this paper presents a framework for concept-level sentiment analysis, aiming to enhance the accuracy of sentiment analysis (SA). A comprehensive Urdu language dataset was constructed by collecting data from YouTube, consisting of various talks and reviews on topics such as movies, politics, and commercial products. The dataset was further enriched by incorporating language rules and Deep Neural Networks (DNN) to optimize polarity detection. For sentiment analysis, the proposed framework employs predefined rules to trigger sentiment flow from words to concepts, leveraging the dependency relations among different words in a sentence based on Urdu language grammatical rules. In cases where predefined patterns are not triggered, the framework seamlessly switches to its sub-symbolic counterpart, passing the data to the DNN for sentence classification. Experimental results demonstrate that the proposed framework surpasses state-of-the-art approaches, including LSTM, CNN, SVM, LR, and MLP, achieving an improvement of 6–7% on Urdu dataset. In conclusion, this research paper introduces a novel framework for concept-level sentiment analysis of Urdu language data sourced from social media platforms. By combining language rules and DNN, the proposed framework demonstrates superior performance compared to existing methodologies, showcasing its effectiveness in accurately analyzing sentiment in Urdu text data.
first_indexed 2024-03-08T22:39:52Z
format Article
id doaj.art-e2322869c1d04a3ebc6043d0feb43894
institution Directory Open Access Journal
issn 2045-2322
language English
last_indexed 2024-03-08T22:39:52Z
publishDate 2023-12-01
publisher Nature Portfolio
record_format Article
series Scientific Reports
spelling doaj.art-e2322869c1d04a3ebc6043d0feb438942023-12-17T12:16:21ZengNature PortfolioScientific Reports2045-23222023-12-0113111610.1038/s41598-023-48817-8A hybrid dependency-based approach for Urdu sentiment analysisUrooba Sehar0Summrina Kanwal1Nasser I. Allheeib2Sultan Almari3Faiza Khan4Kia Dashtipur5Mandar Gogate6Osama A. Khashan7Capital University of Science & TechnologyDivision of Theoretical Computer Science, KTH Royal Institute of Technology StockholmDepartment of Information Systems, College of Computer and Information Sciences, King Saud UniversityDepartment of Computing and Informatics, Saudi Electronic UniversityRiphah International UniversitySchool of Computing, Edinburgh Napier UniversitySchool of Computing, Edinburgh Napier UniversityResearch and Innovation Centers, Rabdan AcademyAbstract In the digital age, social media has emerged as a significant platform, generating a vast amount of raw data daily. This data reflects the opinions of individuals from diverse backgrounds, races, cultures, and age groups, spanning a wide range of topics. Businesses can leverage this data to extract valuable insights, improve their services, and effectively reach a broader audience based on users’ expressed opinions on social media platforms. To harness the potential of this extensive and unstructured data, a deep understanding of Natural Language Processing (NLP) is crucial. Existing approaches for sentiment analysis (SA) often rely on word co-occurrence frequencies, which prove inefficient in practical scenarios. Identifying this research gap, this paper presents a framework for concept-level sentiment analysis, aiming to enhance the accuracy of sentiment analysis (SA). A comprehensive Urdu language dataset was constructed by collecting data from YouTube, consisting of various talks and reviews on topics such as movies, politics, and commercial products. The dataset was further enriched by incorporating language rules and Deep Neural Networks (DNN) to optimize polarity detection. For sentiment analysis, the proposed framework employs predefined rules to trigger sentiment flow from words to concepts, leveraging the dependency relations among different words in a sentence based on Urdu language grammatical rules. In cases where predefined patterns are not triggered, the framework seamlessly switches to its sub-symbolic counterpart, passing the data to the DNN for sentence classification. Experimental results demonstrate that the proposed framework surpasses state-of-the-art approaches, including LSTM, CNN, SVM, LR, and MLP, achieving an improvement of 6–7% on Urdu dataset. In conclusion, this research paper introduces a novel framework for concept-level sentiment analysis of Urdu language data sourced from social media platforms. By combining language rules and DNN, the proposed framework demonstrates superior performance compared to existing methodologies, showcasing its effectiveness in accurately analyzing sentiment in Urdu text data.https://doi.org/10.1038/s41598-023-48817-8
spellingShingle Urooba Sehar
Summrina Kanwal
Nasser I. Allheeib
Sultan Almari
Faiza Khan
Kia Dashtipur
Mandar Gogate
Osama A. Khashan
A hybrid dependency-based approach for Urdu sentiment analysis
Scientific Reports
title A hybrid dependency-based approach for Urdu sentiment analysis
title_full A hybrid dependency-based approach for Urdu sentiment analysis
title_fullStr A hybrid dependency-based approach for Urdu sentiment analysis
title_full_unstemmed A hybrid dependency-based approach for Urdu sentiment analysis
title_short A hybrid dependency-based approach for Urdu sentiment analysis
title_sort hybrid dependency based approach for urdu sentiment analysis
url https://doi.org/10.1038/s41598-023-48817-8
work_keys_str_mv AT uroobasehar ahybriddependencybasedapproachforurdusentimentanalysis
AT summrinakanwal ahybriddependencybasedapproachforurdusentimentanalysis
AT nasseriallheeib ahybriddependencybasedapproachforurdusentimentanalysis
AT sultanalmari ahybriddependencybasedapproachforurdusentimentanalysis
AT faizakhan ahybriddependencybasedapproachforurdusentimentanalysis
AT kiadashtipur ahybriddependencybasedapproachforurdusentimentanalysis
AT mandargogate ahybriddependencybasedapproachforurdusentimentanalysis
AT osamaakhashan ahybriddependencybasedapproachforurdusentimentanalysis
AT uroobasehar hybriddependencybasedapproachforurdusentimentanalysis
AT summrinakanwal hybriddependencybasedapproachforurdusentimentanalysis
AT nasseriallheeib hybriddependencybasedapproachforurdusentimentanalysis
AT sultanalmari hybriddependencybasedapproachforurdusentimentanalysis
AT faizakhan hybriddependencybasedapproachforurdusentimentanalysis
AT kiadashtipur hybriddependencybasedapproachforurdusentimentanalysis
AT mandargogate hybriddependencybasedapproachforurdusentimentanalysis
AT osamaakhashan hybriddependencybasedapproachforurdusentimentanalysis