Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms

People talks on the social media as they feel good and easy way to express their feelings about topic, post or product on the ecommerce websites. In the Asia mostly the people use the Roman Urdu language script for expressing their opinion about the topic. The Sentiment analysis of the Roman Urdu (B...

Full description

Bibliographic Details
Main Authors:	Sameen Aziz, Saleem Ullah, Bushra Mughal, Faheem Mushtaq, Sabih Zahra
Format:	Article
Language:	English
Published:	The University of Lahore 2020-09-01
Series:	Pakistan Journal of Engineering & Technology
Subjects:	machine learning tfidf kaggle svm rf logistic regression naïve bayes adaboost ransac hyper parameter
Online Access:	https://sites2.uol.edu.pk/journals/index.php/pakjet/article/view/537

_version_	1818662391327490048
author	Sameen Aziz Saleem Ullah Bushra Mughal Faheem Mushtaq Sabih Zahra
author_facet	Sameen Aziz Saleem Ullah Bushra Mughal Faheem Mushtaq Sabih Zahra
author_sort	Sameen Aziz
collection	DOAJ
description	People talks on the social media as they feel good and easy way to express their feelings about topic, post or product on the ecommerce websites. In the Asia mostly the people use the Roman Urdu language script for expressing their opinion about the topic. The Sentiment analysis of the Roman Urdu (Bilal et al. 2016)language processes is a big challenging task for the researchers because of lack of resources and its non-structured and non-standard syntax / script. We have collected the Dataset from Kaggle containing 21000 values with manually annotated and prepare the data for machine learning and then we apply different machine learning algorithms(SVM , Logistic regression , Random Forest, Naïve Bayes ,AdaBoost, KNN )(Bowers et al. 2018) with different parameters and kernels and with TFIDF(Unigram , Bigram , Uni-Bigram)(Pereira et al. 2018) from the algorithms we find the best fit algorithm , then from the best algorithm we choose 4 algorithms and combined them to deploy on the data set but after the deployment of the hyperparameters we get the best model build by the Support Vector Machine with linear kernel which are 80% accuracy and F1 score 0.79 precision 0.79 and recall is 0.78 with (Ezpeleta et al. 2018)Grid Search CV and CV is 5 fold. Then we perform experiments on the Robust linear Regression model estimation using (Huang, Gao, and Zhou 2018)(Chum and Matas 2008)RANSAC(random sample Consensus) that gives us the best estimators with 82.19%.
first_indexed	2024-12-17T05:00:12Z
format	Article
id	doaj.art-cd47d4e1340d491ab1ea19172a4f5cbc
institution	Directory Open Access Journal
issn	2664-2042 2664-2050
language	English
last_indexed	2024-12-17T05:00:12Z
publishDate	2020-09-01
publisher	The University of Lahore
record_format	Article
series	Pakistan Journal of Engineering & Technology
spelling	doaj.art-cd47d4e1340d491ab1ea19172a4f5cbc2022-12-21T22:02:35ZengThe University of LahorePakistan Journal of Engineering & Technology2664-20422664-20502020-09-0132172177Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithmsSameen Aziz0Saleem Ullah1Bushra Mughal2Faheem Mushtaq 3Sabih Zahra4Khwaja Fareed University of Engineering and Information Technology, PakistanKhwaja Fareed University of Engineering and Information Technology, PakistanKhwaja Fareed University of Engineering and Information Technology, PakistanKhwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan Khwaja Freed University of Engineering and Information Technology Rahim Yar Khan, Pakistan People talks on the social media as they feel good and easy way to express their feelings about topic, post or product on the ecommerce websites. In the Asia mostly the people use the Roman Urdu language script for expressing their opinion about the topic. The Sentiment analysis of the Roman Urdu (Bilal et al. 2016)language processes is a big challenging task for the researchers because of lack of resources and its non-structured and non-standard syntax / script. We have collected the Dataset from Kaggle containing 21000 values with manually annotated and prepare the data for machine learning and then we apply different machine learning algorithms(SVM , Logistic regression , Random Forest, Naïve Bayes ,AdaBoost, KNN )(Bowers et al. 2018) with different parameters and kernels and with TFIDF(Unigram , Bigram , Uni-Bigram)(Pereira et al. 2018) from the algorithms we find the best fit algorithm , then from the best algorithm we choose 4 algorithms and combined them to deploy on the data set but after the deployment of the hyperparameters we get the best model build by the Support Vector Machine with linear kernel which are 80% accuracy and F1 score 0.79 precision 0.79 and recall is 0.78 with (Ezpeleta et al. 2018)Grid Search CV and CV is 5 fold. Then we perform experiments on the Robust linear Regression model estimation using (Huang, Gao, and Zhou 2018)(Chum and Matas 2008)RANSAC(random sample Consensus) that gives us the best estimators with 82.19%.https://sites2.uol.edu.pk/journals/index.php/pakjet/article/view/537machine learningtfidfkagglesvmrflogistic regressionnaïve bayesadaboostransachyper parameter
spellingShingle	Sameen Aziz Saleem Ullah Bushra Mughal Faheem Mushtaq Sabih Zahra Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms Pakistan Journal of Engineering & Technology machine learning tfidf kaggle svm rf logistic regression naïve bayes adaboost ransac hyper parameter
title	Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms
title_full	Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms
title_fullStr	Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms
title_full_unstemmed	Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms
title_short	Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms
title_sort	roman urdu sentiment analysis using machine learning with best parameters and comparative study of machine learning algorithms
topic	machine learning tfidf kaggle svm rf logistic regression naïve bayes adaboost ransac hyper parameter
url	https://sites2.uol.edu.pk/journals/index.php/pakjet/article/view/537
work_keys_str_mv	AT sameenaziz romanurdusentimentanalysisusingmachinelearningwithbestparametersandcomparativestudyofmachinelearningalgorithms AT saleemullah romanurdusentimentanalysisusingmachinelearningwithbestparametersandcomparativestudyofmachinelearningalgorithms AT bushramughal romanurdusentimentanalysisusingmachinelearningwithbestparametersandcomparativestudyofmachinelearningalgorithms AT faheemmushtaq romanurdusentimentanalysisusingmachinelearningwithbestparametersandcomparativestudyofmachinelearningalgorithms AT sabihzahra romanurdusentimentanalysisusingmachinelearningwithbestparametersandcomparativestudyofmachinelearningalgorithms

Roman Urdu sentiment analysis using Machine Learning with best parameters and comparative study of Machine Learning algorithms

Similar Items