Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering
With the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10024280/ |
_version_ | 1797938812773466112 |
---|---|
author | Tianxiang He Chansu Han Ryoichi Isawa Takeshi Takahashi Shuji Kijima Jun'ichi Takeuchi |
author_facet | Tianxiang He Chansu Han Ryoichi Isawa Takeshi Takahashi Shuji Kijima Jun'ichi Takeuchi |
author_sort | Tianxiang He |
collection | DOAJ |
description | With the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a large-scale malware set, conventional methods for constructing a phylogenetic tree is very time-consuming or even cannot be done in a realistic time. To solve this problem, we propose a high-speed, scalable phylogenetic tree construction algorithm with a clustering algorithm to cluster it. The proposed method involves the following steps: (1) Calculating the similarity of the specimen pairs using the normalized compression distance. (2) Creating a phylogenetic tree containing all specimens, instead of calculating the similarity of all pairs of a specimen, our algorithm only calculates a small part of the similarity matrix. (3) Dividing the phylogenetic tree into clusters by applying the minimum description length criterion. In addition, we propose a new online processing algorithm to add new malware specimens into the existing phylogenetic tree sequentially. Our goal is to reduce the computational cost of constructing the phylogenetic tree and improve the clustering accuracy of our previous research. We evaluated our method’s clustering accuracy and scalability with 65,494 IoT malware specimens. The results showed that our algorithm reduced the computation by 97.52% compared with the conventional method. Our clustering algorithm achieved accuracies of 95.5% and 99.3% for clustering family name and architecture name, respectively. |
first_indexed | 2024-04-10T19:06:43Z |
format | Article |
id | doaj.art-6d106bc6fbe343768337ee0142d27fb3 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-04-10T19:06:43Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-6d106bc6fbe343768337ee0142d27fb32023-01-31T00:01:09ZengIEEEIEEE Access2169-35362023-01-01118240825310.1109/ACCESS.2023.323871110024280Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware ClusteringTianxiang He0https://orcid.org/0000-0001-8767-817XChansu Han1https://orcid.org/0000-0002-1728-5300Ryoichi Isawa2Takeshi Takahashi3https://orcid.org/0000-0002-6477-7770Shuji Kijima4Jun'ichi Takeuchi5https://orcid.org/0000-0002-5819-3082Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanWith the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a large-scale malware set, conventional methods for constructing a phylogenetic tree is very time-consuming or even cannot be done in a realistic time. To solve this problem, we propose a high-speed, scalable phylogenetic tree construction algorithm with a clustering algorithm to cluster it. The proposed method involves the following steps: (1) Calculating the similarity of the specimen pairs using the normalized compression distance. (2) Creating a phylogenetic tree containing all specimens, instead of calculating the similarity of all pairs of a specimen, our algorithm only calculates a small part of the similarity matrix. (3) Dividing the phylogenetic tree into clusters by applying the minimum description length criterion. In addition, we propose a new online processing algorithm to add new malware specimens into the existing phylogenetic tree sequentially. Our goal is to reduce the computational cost of constructing the phylogenetic tree and improve the clustering accuracy of our previous research. We evaluated our method’s clustering accuracy and scalability with 65,494 IoT malware specimens. The results showed that our algorithm reduced the computation by 97.52% compared with the conventional method. Our clustering algorithm achieved accuracies of 95.5% and 99.3% for clustering family name and architecture name, respectively.https://ieeexplore.ieee.org/document/10024280/ClusteringIoT malwareMDL criterionphylogenetic tree |
spellingShingle | Tianxiang He Chansu Han Ryoichi Isawa Takeshi Takahashi Shuji Kijima Jun'ichi Takeuchi Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering IEEE Access Clustering IoT malware MDL criterion phylogenetic tree |
title | Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering |
title_full | Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering |
title_fullStr | Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering |
title_full_unstemmed | Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering |
title_short | Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering |
title_sort | scalable and fast algorithm for constructing phylogenetic trees with application to iot malware clustering |
topic | Clustering IoT malware MDL criterion phylogenetic tree |
url | https://ieeexplore.ieee.org/document/10024280/ |
work_keys_str_mv | AT tianxianghe scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering AT chansuhan scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering AT ryoichiisawa scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering AT takeshitakahashi scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering AT shujikijima scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering AT junichitakeuchi scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering |