Detection of malware in downloaded files using various machine learning models

Malware has become an enormous risk in today’s world. There are different kinds of malware or malicious programs found on the internet. Research shows that malware has grown exponentially over the last decade, causing substantial financial losses to various organizations. Malware is a malicious prog...

Full description

Bibliographic Details
Main Authors:	Akshit Kamboj, Priyanshu Kumar, Amit Kumar Bairwa, Sandeep Joshi
Format:	Article
Language:	English
Published:	Elsevier 2023-03-01
Series:	Egyptian Informatics Journal
Subjects:	Cryptography SHA 256 AES LSB Security
Online Access:	http://www.sciencedirect.com/science/article/pii/S111086652200072X

_version_	1828012119465918464
author	Akshit Kamboj Priyanshu Kumar Amit Kumar Bairwa Sandeep Joshi
author_facet	Akshit Kamboj Priyanshu Kumar Amit Kumar Bairwa Sandeep Joshi
author_sort	Akshit Kamboj
collection	DOAJ
description	Malware has become an enormous risk in today’s world. There are different kinds of malware or malicious programs found on the internet. Research shows that malware has grown exponentially over the last decade, causing substantial financial losses to various organizations. Malware is a malicious program or software that proves exceedingly harmful to the user’s computer. The user’s system can be affected in several ways. The proposed solution uses various machine learning techniques to detect whether a file downloaded from the internet contains malware or not. This research aims to use different machine learning algorithms to differentiate between malicious and benign files successfully. The main idea is to study different features of the downloaded file like MD5 hash, size of the Optional Header, and Load Configuration Size. Based on the analysis performed on these features, the files will be classified as malicious or non-malicious. The models are trained on these different features which enables them to learn how to classify files. The models after proper training will be compared among each other based on various criteria. This comparison is made with the help of the Validation and Test datasets. Finally, the model with the best accuracy will be selected. This process helps in identifying all those types of malware that can have a detrimental impact on the user’s system after getting infected. The approach used here will be able to detect malware like Adware, Trojan, Backdoors, Unknown, Multidrop, Rbot, Spam, and Ransomware. After training and testing various machine learning models, the Random Forest Classifier was found to be the most accurate. It’s accuracy went as high as 99.99% in the case of the test dataset. This was closely followed by the XGBoost model with an accuracy of 99.68%. The results of five different models have been compared with those obtained in the previous research. These include the Decision Tree Classifier (99.57% accuracy), Random Forest Classifier (99.99% accuracy), Gradient Boosting Model (99.09% accuracy), XGBoost Model (99.68% accuracy), and AdaBoost Model (98.87% accuracy). Four out of five of these models have been found to have accuracies greater than those obtained in previous research works.
first_indexed	2024-04-10T09:25:40Z
format	Article
id	doaj.art-978c31ba7b994e788c088c012b73c03f
institution	Directory Open Access Journal
issn	1110-8665
language	English
last_indexed	2024-04-10T09:25:40Z
publishDate	2023-03-01
publisher	Elsevier
record_format	Article
series	Egyptian Informatics Journal
spelling	doaj.art-978c31ba7b994e788c088c012b73c03f2023-02-20T04:08:53ZengElsevierEgyptian Informatics Journal1110-86652023-03-012418194Detection of malware in downloaded files using various machine learning modelsAkshit Kamboj0Priyanshu Kumar1Amit Kumar Bairwa2Sandeep Joshi3Manipal University Jaipur, Rajastham, IndiaManipal University Jaipur, Rajastham, IndiaCorresponding author.; Manipal University Jaipur, Rajastham, IndiaManipal University Jaipur, Rajastham, IndiaMalware has become an enormous risk in today’s world. There are different kinds of malware or malicious programs found on the internet. Research shows that malware has grown exponentially over the last decade, causing substantial financial losses to various organizations. Malware is a malicious program or software that proves exceedingly harmful to the user’s computer. The user’s system can be affected in several ways. The proposed solution uses various machine learning techniques to detect whether a file downloaded from the internet contains malware or not. This research aims to use different machine learning algorithms to differentiate between malicious and benign files successfully. The main idea is to study different features of the downloaded file like MD5 hash, size of the Optional Header, and Load Configuration Size. Based on the analysis performed on these features, the files will be classified as malicious or non-malicious. The models are trained on these different features which enables them to learn how to classify files. The models after proper training will be compared among each other based on various criteria. This comparison is made with the help of the Validation and Test datasets. Finally, the model with the best accuracy will be selected. This process helps in identifying all those types of malware that can have a detrimental impact on the user’s system after getting infected. The approach used here will be able to detect malware like Adware, Trojan, Backdoors, Unknown, Multidrop, Rbot, Spam, and Ransomware. After training and testing various machine learning models, the Random Forest Classifier was found to be the most accurate. It’s accuracy went as high as 99.99% in the case of the test dataset. This was closely followed by the XGBoost model with an accuracy of 99.68%. The results of five different models have been compared with those obtained in the previous research. These include the Decision Tree Classifier (99.57% accuracy), Random Forest Classifier (99.99% accuracy), Gradient Boosting Model (99.09% accuracy), XGBoost Model (99.68% accuracy), and AdaBoost Model (98.87% accuracy). Four out of five of these models have been found to have accuracies greater than those obtained in previous research works.http://www.sciencedirect.com/science/article/pii/S111086652200072XCryptographySHA 256AESLSBSecurity
spellingShingle	Akshit Kamboj Priyanshu Kumar Amit Kumar Bairwa Sandeep Joshi Detection of malware in downloaded files using various machine learning models Egyptian Informatics Journal Cryptography SHA 256 AES LSB Security
title	Detection of malware in downloaded files using various machine learning models
title_full	Detection of malware in downloaded files using various machine learning models
title_fullStr	Detection of malware in downloaded files using various machine learning models
title_full_unstemmed	Detection of malware in downloaded files using various machine learning models
title_short	Detection of malware in downloaded files using various machine learning models
title_sort	detection of malware in downloaded files using various machine learning models
topic	Cryptography SHA 256 AES LSB Security
url	http://www.sciencedirect.com/science/article/pii/S111086652200072X
work_keys_str_mv	AT akshitkamboj detectionofmalwareindownloadedfilesusingvariousmachinelearningmodels AT priyanshukumar detectionofmalwareindownloadedfilesusingvariousmachinelearningmodels AT amitkumarbairwa detectionofmalwareindownloadedfilesusingvariousmachinelearningmodels AT sandeepjoshi detectionofmalwareindownloadedfilesusingvariousmachinelearningmodels

Detection of malware in downloaded files using various machine learning models

Similar Items