Youtube spam detection framework using naïve bayes and logistic regression

YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link...

Full description

Bibliographic Details
Main Authors: Maulat Samsudin, Nur’Ain, Mohd Foozy, Cik Feresa, Alias, Nabilah, Shamala, Palaniappan, Othman, Nur Fadzilah, Wan Din, Wan Isni Sofiah
Format: Article
Language:English
Published: Universitas Ahmad Dahlan 2019
Subjects:
Online Access:http://eprints.uthm.edu.my/4433/1/AJ%202019%20%28272%29.pdf
_version_ 1825709843203227648
author Maulat Samsudin, Nur’Ain
Mohd Foozy, Cik Feresa
Alias, Nabilah
Shamala, Palaniappan
Othman, Nur Fadzilah
Wan Din, Wan Isni Sofiah
author_facet Maulat Samsudin, Nur’Ain
Mohd Foozy, Cik Feresa
Alias, Nabilah
Shamala, Palaniappan
Othman, Nur Fadzilah
Wan Din, Wan Isni Sofiah
author_sort Maulat Samsudin, Nur’Ain
collection UTHM
description YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam detection framework that consists of five (5) phases such as data collection, pre-processing, features selection and extraction, classification and detection were developed. This paper, proposed the YouTube detection framework, examined and validate each of the phases by using two types of data mining tool. The features are constructed from analysis by using data collected from YouTube Spam dataset by using Naïve Bayes and Logistic Regression and tested in two different data mining tools which is Weka and Rapid Miner. From the analysis, thirteen (13) features that had been tested on Weka and RapidMiner shows high accuracy, hence is being used throughout the experiment in this research. Result of Naïve Bayes and Logistic Regression run in Weka is slightly higher than RapidMiner. In addition, result of Naïve Bayes is higher than Logistic Regression with 87.21% and 85.29% respectively in Weka. While in RapidMiner there is slightly different of accuracy between Naïve Bayes and Logistic Regression 80.41% and 80.88%. But, precision of Naïve Bayes is higher than Logistic Regression.
first_indexed 2024-03-05T21:48:33Z
format Article
id uthm.eprints-4433
institution Universiti Tun Hussein Onn Malaysia
language English
last_indexed 2024-03-05T21:48:33Z
publishDate 2019
publisher Universitas Ahmad Dahlan
record_format dspace
spelling uthm.eprints-44332021-12-06T07:56:33Z http://eprints.uthm.edu.my/4433/ Youtube spam detection framework using naïve bayes and logistic regression Maulat Samsudin, Nur’Ain Mohd Foozy, Cik Feresa Alias, Nabilah Shamala, Palaniappan Othman, Nur Fadzilah Wan Din, Wan Isni Sofiah TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television YouTube has become a popular social media among the users. Due to YouTube popularity, it became a platform for spammer to distribute spam through the comments on YouTube. This has become a concern because spam can lead to phishing attack which the target can be any user that click any malicious link. Spam has its own features that can be analyzed and detected by classification. Hence, enhancement features are proposed to detect YouTube spam. In order to conduct the experiments, a YouTube Spam detection framework that consists of five (5) phases such as data collection, pre-processing, features selection and extraction, classification and detection were developed. This paper, proposed the YouTube detection framework, examined and validate each of the phases by using two types of data mining tool. The features are constructed from analysis by using data collected from YouTube Spam dataset by using Naïve Bayes and Logistic Regression and tested in two different data mining tools which is Weka and Rapid Miner. From the analysis, thirteen (13) features that had been tested on Weka and RapidMiner shows high accuracy, hence is being used throughout the experiment in this research. Result of Naïve Bayes and Logistic Regression run in Weka is slightly higher than RapidMiner. In addition, result of Naïve Bayes is higher than Logistic Regression with 87.21% and 85.29% respectively in Weka. While in RapidMiner there is slightly different of accuracy between Naïve Bayes and Logistic Regression 80.41% and 80.88%. But, precision of Naïve Bayes is higher than Logistic Regression. Universitas Ahmad Dahlan 2019 Article PeerReviewed text en http://eprints.uthm.edu.my/4433/1/AJ%202019%20%28272%29.pdf Maulat Samsudin, Nur’Ain and Mohd Foozy, Cik Feresa and Alias, Nabilah and Shamala, Palaniappan and Othman, Nur Fadzilah and Wan Din, Wan Isni Sofiah (2019) Youtube spam detection framework using naïve bayes and logistic regression. Indonesian Journal of Electrical Engineering and Computer Science, 14 (3). pp. 1508-1517. ISSN 2502-4752 https://dx.doi.org/10.11591/ijeecs.v14.i3.pp1508-1517
spellingShingle TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television
Maulat Samsudin, Nur’Ain
Mohd Foozy, Cik Feresa
Alias, Nabilah
Shamala, Palaniappan
Othman, Nur Fadzilah
Wan Din, Wan Isni Sofiah
Youtube spam detection framework using naïve bayes and logistic regression
title Youtube spam detection framework using naïve bayes and logistic regression
title_full Youtube spam detection framework using naïve bayes and logistic regression
title_fullStr Youtube spam detection framework using naïve bayes and logistic regression
title_full_unstemmed Youtube spam detection framework using naïve bayes and logistic regression
title_short Youtube spam detection framework using naïve bayes and logistic regression
title_sort youtube spam detection framework using naive bayes and logistic regression
topic TK5101-6720 Telecommunication. Including telegraphy, telephone, radio, radar, television
url http://eprints.uthm.edu.my/4433/1/AJ%202019%20%28272%29.pdf
work_keys_str_mv AT maulatsamsudinnurain youtubespamdetectionframeworkusingnaivebayesandlogisticregression
AT mohdfoozycikferesa youtubespamdetectionframeworkusingnaivebayesandlogisticregression
AT aliasnabilah youtubespamdetectionframeworkusingnaivebayesandlogisticregression
AT shamalapalaniappan youtubespamdetectionframeworkusingnaivebayesandlogisticregression
AT othmannurfadzilah youtubespamdetectionframeworkusingnaivebayesandlogisticregression
AT wandinwanisnisofiah youtubespamdetectionframeworkusingnaivebayesandlogisticregression