Summary: | Phishing email classification requires features so that the performance obtained produces good accuracy. One of
the reasons for the lack of development of models for detecting phishing emails is the complexity of the feature
selection. Feature selection is one of the essential parts of getting a good classification result, commonly used
features are header, body, and Uniform Resource Locator (URL). Besides the email body text content, the URL
is one of the leading indicators that the phishing attack successfully happened. The URL is commonly located on
the body of the phishing email to get the victim's attention. It will redirect the victim to a fake website to obtain
personal information from the victim. There is a lack of information about how the URL features affect the
phishing email classification results. Therefore, this work focuses on using URL features to determine whether an
email is phishing or legitimate using machine learning approaches. Two public datasets used in this work are the
Online Phishing Corpus and Enron Corpus. The URL features are extracted using the Beautiful Soup library. Two
machine learning classifiers used in this work are Support Vector Machine (SVM) and Artificial Neural Network
(ANN). The experiments were divided into two based on features used in the classifiers. The first experiment used
raw email data with URL features, while the second only used raw email data. The first experiment shows higher
accuracy in both classifiers, SVM and ANN. Hence, this research proves that the impact of selecting URL features
will increase the performance of the classification.
|