Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
Recognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-02-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/3/1645 |
_version_ | 1797489026807103488 |
---|---|
author | Angelo Cannarile Vincenzo Dentamaro Stefano Galantucci Andrea Iannacone Donato Impedovo Giuseppe Pirlo |
author_facet | Angelo Cannarile Vincenzo Dentamaro Stefano Galantucci Andrea Iannacone Donato Impedovo Giuseppe Pirlo |
author_sort | Angelo Cannarile |
collection | DOAJ |
description | Recognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or CAPEv2. This chain of calls can then be used to classify if the considered file is benign or malware. This work aims to compare six modern shallow learning and deep learning techniques based on tabular data, using two datasets of API calls containing malware and goodware, where the corresponding chain of API calls is expressed for each instance. The results show the quality of shallow learning approaches based on tree ensembles, such as CatBoost, both in terms of F1-macro score and Area Under the ROC curve (AUC ROC), and training time, making them optimal for making inferences on Edge AI solutions. The results are then analyzed with the explainable AI SHAP technique, identifying the API calls that most influence the process, i.e., those that are particularly afferent to malware and goodware. |
first_indexed | 2024-03-10T00:10:41Z |
format | Article |
id | doaj.art-f69d325106cd4069b09481172aa11463 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T00:10:41Z |
publishDate | 2022-02-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-f69d325106cd4069b09481172aa114632023-11-23T16:01:04ZengMDPI AGApplied Sciences2076-34172022-02-01123164510.3390/app12031645Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A StudyAngelo Cannarile0Vincenzo Dentamaro1Stefano Galantucci2Andrea Iannacone3Donato Impedovo4Giuseppe Pirlo5BVTech SpA, 20123 Milano, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyBVTech SpA, 20123 Milano, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyRecognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or CAPEv2. This chain of calls can then be used to classify if the considered file is benign or malware. This work aims to compare six modern shallow learning and deep learning techniques based on tabular data, using two datasets of API calls containing malware and goodware, where the corresponding chain of API calls is expressed for each instance. The results show the quality of shallow learning approaches based on tree ensembles, such as CatBoost, both in terms of F1-macro score and Area Under the ROC curve (AUC ROC), and training time, making them optimal for making inferences on Edge AI solutions. The results are then analyzed with the explainable AI SHAP technique, identifying the API calls that most influence the process, i.e., those that are particularly afferent to malware and goodware.https://www.mdpi.com/2076-3417/12/3/1645CAPEv2cuckoomachine learningclassificationmalware predictionAPI calls |
spellingShingle | Angelo Cannarile Vincenzo Dentamaro Stefano Galantucci Andrea Iannacone Donato Impedovo Giuseppe Pirlo Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study Applied Sciences CAPEv2 cuckoo machine learning classification malware prediction API calls |
title | Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study |
title_full | Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study |
title_fullStr | Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study |
title_full_unstemmed | Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study |
title_short | Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study |
title_sort | comparing deep learning and shallow learning techniques for api calls malware prediction a study |
topic | CAPEv2 cuckoo machine learning classification malware prediction API calls |
url | https://www.mdpi.com/2076-3417/12/3/1645 |
work_keys_str_mv | AT angelocannarile comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT vincenzodentamaro comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT stefanogalantucci comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT andreaiannacone comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT donatoimpedovo comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT giuseppepirlo comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy |