A conceptual model to identify illegal activities on the bitcoin system

Soon after its inception in 2009, Bitcoin was used as a tool by malicious attackers who exploit its pseudo-anonymity to establish untraceable frauds. Recently, several Bitcoin users and institutions have confirmed that thousands of Bitcoins were lost due to the failure to implement a fraud detection...

Full description

Bibliographic Details
Main Authors: Al-Hashedi, Khaled Gubran, Magalingam, Pritheega, Maarop, Nurazean, Samy, Ganthan Narayana, Abdul Manaf, Azizah
Format: Conference or Workshop Item
Published: 2021
Subjects:
Description
Summary:Soon after its inception in 2009, Bitcoin was used as a tool by malicious attackers who exploit its pseudo-anonymity to establish untraceable frauds. Recently, several Bitcoin users and institutions have confirmed that thousands of Bitcoins were lost due to the failure to implement a fraud detection system, causing significant damage to individuals or institutions and resulting in bankruptcy. The anonymous nature of the Bitcoin system makes it a desirable option for malicious people to carry out illegal activities, making it difficult for law enforcement to detect suspicious behavior and making the current fraud detection techniques impractical. Thus, identifying illegal activities becomes an important factor to protect the reputation of the Bitcoin system. In this paper, we propose a model to identify illegal transactions in the Bitcoin system. Firstly, we collect illegal addresses for data labeling purposes from different sources such as online public bitcoin forums and related datasets from previous papers and then verify them with a raw Bitcoin dataset. Secondly, we introduce new types of features by using a time-based approach to segment transactions into time slices over a period in addition to the most meaningful features of the prior studies. Thirdly, we evaluate the proposed model on five popular supervised classifiers (KNN, SVM, RF, XGB, and KNN). Finally, this paper considers the problem of class imbalance and attained better optimization when using an adaptive oversampling technique (ADASYN). Results obtained from this study demonstrate that RF and XGB outperform KNN, SVM, and NN in terms of detection rate.