Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering

With the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a...

Full description

Bibliographic Details
Main Authors: Tianxiang He, Chansu Han, Ryoichi Isawa, Takeshi Takahashi, Shuji Kijima, Jun'ichi Takeuchi
Format: Article
Language:English
Published: IEEE 2023-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10024280/
_version_ 1797938812773466112
author Tianxiang He
Chansu Han
Ryoichi Isawa
Takeshi Takahashi
Shuji Kijima
Jun'ichi Takeuchi
author_facet Tianxiang He
Chansu Han
Ryoichi Isawa
Takeshi Takahashi
Shuji Kijima
Jun'ichi Takeuchi
author_sort Tianxiang He
collection DOAJ
description With the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a large-scale malware set, conventional methods for constructing a phylogenetic tree is very time-consuming or even cannot be done in a realistic time. To solve this problem, we propose a high-speed, scalable phylogenetic tree construction algorithm with a clustering algorithm to cluster it. The proposed method involves the following steps: (1) Calculating the similarity of the specimen pairs using the normalized compression distance. (2) Creating a phylogenetic tree containing all specimens, instead of calculating the similarity of all pairs of a specimen, our algorithm only calculates a small part of the similarity matrix. (3) Dividing the phylogenetic tree into clusters by applying the minimum description length criterion. In addition, we propose a new online processing algorithm to add new malware specimens into the existing phylogenetic tree sequentially. Our goal is to reduce the computational cost of constructing the phylogenetic tree and improve the clustering accuracy of our previous research. We evaluated our method’s clustering accuracy and scalability with 65,494 IoT malware specimens. The results showed that our algorithm reduced the computation by 97.52% compared with the conventional method. Our clustering algorithm achieved accuracies of 95.5% and 99.3% for clustering family name and architecture name, respectively.
first_indexed 2024-04-10T19:06:43Z
format Article
id doaj.art-6d106bc6fbe343768337ee0142d27fb3
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-04-10T19:06:43Z
publishDate 2023-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-6d106bc6fbe343768337ee0142d27fb32023-01-31T00:01:09ZengIEEEIEEE Access2169-35362023-01-01118240825310.1109/ACCESS.2023.323871110024280Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware ClusteringTianxiang He0https://orcid.org/0000-0001-8767-817XChansu Han1https://orcid.org/0000-0002-1728-5300Ryoichi Isawa2Takeshi Takahashi3https://orcid.org/0000-0002-6477-7770Shuji Kijima4Jun'ichi Takeuchi5https://orcid.org/0000-0002-5819-3082Graduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanNational Institute of Information and Communications Technology, Koganei, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanGraduate School and Faculty of Information Science and Electrical Engineering, Kyushu University, Fukuoka, JapanWith the development of IoT devices, there is a rapid increase in new types of IoT malware and variants, causing social problems. The malware’s phylogenetic tree has been used in many studies for malware clustering or better understanding of malware evolution. However, when dealing with a large-scale malware set, conventional methods for constructing a phylogenetic tree is very time-consuming or even cannot be done in a realistic time. To solve this problem, we propose a high-speed, scalable phylogenetic tree construction algorithm with a clustering algorithm to cluster it. The proposed method involves the following steps: (1) Calculating the similarity of the specimen pairs using the normalized compression distance. (2) Creating a phylogenetic tree containing all specimens, instead of calculating the similarity of all pairs of a specimen, our algorithm only calculates a small part of the similarity matrix. (3) Dividing the phylogenetic tree into clusters by applying the minimum description length criterion. In addition, we propose a new online processing algorithm to add new malware specimens into the existing phylogenetic tree sequentially. Our goal is to reduce the computational cost of constructing the phylogenetic tree and improve the clustering accuracy of our previous research. We evaluated our method’s clustering accuracy and scalability with 65,494 IoT malware specimens. The results showed that our algorithm reduced the computation by 97.52% compared with the conventional method. Our clustering algorithm achieved accuracies of 95.5% and 99.3% for clustering family name and architecture name, respectively.https://ieeexplore.ieee.org/document/10024280/ClusteringIoT malwareMDL criterionphylogenetic tree
spellingShingle Tianxiang He
Chansu Han
Ryoichi Isawa
Takeshi Takahashi
Shuji Kijima
Jun'ichi Takeuchi
Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering
IEEE Access
Clustering
IoT malware
MDL criterion
phylogenetic tree
title Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering
title_full Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering
title_fullStr Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering
title_full_unstemmed Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering
title_short Scalable and Fast Algorithm for Constructing Phylogenetic Trees With Application to IoT Malware Clustering
title_sort scalable and fast algorithm for constructing phylogenetic trees with application to iot malware clustering
topic Clustering
IoT malware
MDL criterion
phylogenetic tree
url https://ieeexplore.ieee.org/document/10024280/
work_keys_str_mv AT tianxianghe scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering
AT chansuhan scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering
AT ryoichiisawa scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering
AT takeshitakahashi scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering
AT shujikijima scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering
AT junichitakeuchi scalableandfastalgorithmforconstructingphylogenetictreeswithapplicationtoiotmalwareclustering