Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study

Recognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or...

Full description

Bibliographic Details
Main Authors:	Angelo Cannarile, Vincenzo Dentamaro, Stefano Galantucci, Andrea Iannacone, Donato Impedovo, Giuseppe Pirlo
Format:	Article
Language:	English
Published:	MDPI AG 2022-02-01
Series:	Applied Sciences
Subjects:	CAPEv2 cuckoo machine learning classification malware prediction API calls
Online Access:	https://www.mdpi.com/2076-3417/12/3/1645

_version_	1797489026807103488
author	Angelo Cannarile Vincenzo Dentamaro Stefano Galantucci Andrea Iannacone Donato Impedovo Giuseppe Pirlo
author_facet	Angelo Cannarile Vincenzo Dentamaro Stefano Galantucci Andrea Iannacone Donato Impedovo Giuseppe Pirlo
author_sort	Angelo Cannarile
collection	DOAJ
description	Recognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or CAPEv2. This chain of calls can then be used to classify if the considered file is benign or malware. This work aims to compare six modern shallow learning and deep learning techniques based on tabular data, using two datasets of API calls containing malware and goodware, where the corresponding chain of API calls is expressed for each instance. The results show the quality of shallow learning approaches based on tree ensembles, such as CatBoost, both in terms of F1-macro score and Area Under the ROC curve (AUC ROC), and training time, making them optimal for making inferences on Edge AI solutions. The results are then analyzed with the explainable AI SHAP technique, identifying the API calls that most influence the process, i.e., those that are particularly afferent to malware and goodware.
first_indexed	2024-03-10T00:10:41Z
format	Article
id	doaj.art-f69d325106cd4069b09481172aa11463
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-10T00:10:41Z
publishDate	2022-02-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-f69d325106cd4069b09481172aa114632023-11-23T16:01:04ZengMDPI AGApplied Sciences2076-34172022-02-01123164510.3390/app12031645Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A StudyAngelo Cannarile0Vincenzo Dentamaro1Stefano Galantucci2Andrea Iannacone3Donato Impedovo4Giuseppe Pirlo5BVTech SpA, 20123 Milano, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyBVTech SpA, 20123 Milano, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyDepartment of Computer Science, University of Bari ”Aldo Moro”, 70125 Bari, ItalyRecognition of malware is critical in cybersecurity as it allows for avoiding execution and the downloading of malware. One of the possible approaches is to analyze the executable’s Application Programming Interface (API) calls, which can be done using tools that work in sandboxes, such as Cuckoo or CAPEv2. This chain of calls can then be used to classify if the considered file is benign or malware. This work aims to compare six modern shallow learning and deep learning techniques based on tabular data, using two datasets of API calls containing malware and goodware, where the corresponding chain of API calls is expressed for each instance. The results show the quality of shallow learning approaches based on tree ensembles, such as CatBoost, both in terms of F1-macro score and Area Under the ROC curve (AUC ROC), and training time, making them optimal for making inferences on Edge AI solutions. The results are then analyzed with the explainable AI SHAP technique, identifying the API calls that most influence the process, i.e., those that are particularly afferent to malware and goodware.https://www.mdpi.com/2076-3417/12/3/1645CAPEv2cuckoomachine learningclassificationmalware predictionAPI calls
spellingShingle	Angelo Cannarile Vincenzo Dentamaro Stefano Galantucci Andrea Iannacone Donato Impedovo Giuseppe Pirlo Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study Applied Sciences CAPEv2 cuckoo machine learning classification malware prediction API calls
title	Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_full	Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_fullStr	Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_full_unstemmed	Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_short	Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study
title_sort	comparing deep learning and shallow learning techniques for api calls malware prediction a study
topic	CAPEv2 cuckoo machine learning classification malware prediction API calls
url	https://www.mdpi.com/2076-3417/12/3/1645
work_keys_str_mv	AT angelocannarile comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT vincenzodentamaro comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT stefanogalantucci comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT andreaiannacone comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT donatoimpedovo comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy AT giuseppepirlo comparingdeeplearningandshallowlearningtechniquesforapicallsmalwarepredictionastudy

Comparing Deep Learning and Shallow Learning Techniques for API Calls Malware Prediction: A Study

Similar Items