Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study

Recognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or...

Full description

Bibliographic Details
Main Authors: Angelo Cannarile, Vincenzo Dentamaro, Stefano Galantucci, Andrea Iannacone, Donato Impedovo, Giuseppe Pirlo
Format: Article
Language:English
Published: MDPI AG 2022-02-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/3/1645
_version_ 1797489026807103488
author Angelo Cannarile
Vincenzo Dentamaro
Stefano Galantucci
Andrea Iannacone
Donato Impedovo
Giuseppe Pirlo
author_facet Angelo Cannarile
Vincenzo Dentamaro
Stefano Galantucci
Andrea Iannacone
Donato Impedovo
Giuseppe Pirlo
author_sort Angelo Cannarile
collection DOAJ
description Recognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or CAPEv2. This chain of calls can then be used to classify if the considered file is benign or malware. This work aims to compare six modern shallow learning and deep learning techniques based on tabular data, using two datasets of API calls containing malware and goodware, where the corresponding chain of API calls is expressed for each instance. The results show the quality of shallow learning approaches based on tree ensembles, such as CatBoost, both in terms of F1-macro score and Area Under the ROC curve (AUC ROC), and training time, making them optimal for making inferences on Edge AI solutions. The results are then analyzed with the explainable AI SHAP technique, identifying the API calls that most influence the process, i.e., those that are particularly afferent to malware and goodware.
first_indexed 2024-03-10T00:10:41Z
format Article
id doaj.art-f69d325106cd4069b09481172aa11463
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T00:10:41Z
publishDate 2022-02-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-f69d325106cd4069b09481172aa114632023-11-23T16:01:04ZengMDPI AGApplied Sciences2076-34172022-02-01123164510.3390/app12031645Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A StudyAngelo Cannarile0Vincenzo Dentamaro1Stefano Galantucci2Andrea Iannacone3Donato Impedovo4Giuseppe Pirlo5BVTech SpA, 20123 Milano, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyBVTech SpA, 20123 Milano, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyRecognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or CAPEv2. This chain of calls can then be used to classify if the considered file is benign or malware. This work aims to compare six modern shallow learning and deep learning techniques based on tabular data, using two datasets of API calls containing malware and goodware, where the corresponding chain of API calls is expressed for each instance. The results show the quality of shallow learning approaches based on tree ensembles, such as CatBoost, both in terms of F1-macro score and Area Under the ROC curve (AUC ROC), and training time, making them optimal for making inferences on Edge AI solutions. The results are then analyzed with the explainable AI SHAP technique, identifying the API calls that most influence the process, i.e., those that are particularly afferent to malware and goodware.https://www.mdpi.com/2076-3417/12/3/1645CAPEv2cuckoomachine learningclassificationmalware predictionAPI calls
spellingShingle Angelo Cannarile
Vincenzo Dentamaro
Stefano Galantucci
Andrea Iannacone
Donato Impedovo
Giuseppe Pirlo
Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
Applied Sciences
CAPEv2
cuckoo
machine learning
classification
malware prediction
API calls
title Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_full Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_fullStr Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_full_unstemmed Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_short Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_sort comparing deep learning and shallow learning techniques for api calls malware prediction a study
topic CAPEv2
cuckoo
machine learning
classification
malware prediction
API calls
url https://www.mdpi.com/2076-3417/12/3/1645
work_keys_str_mv AT angelocannarile comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy
AT vincenzodentamaro comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy
AT stefanogalantucci comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy
AT andreaiannacone comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy
AT donatoimpedovo comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy
AT giuseppepirlo comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy