A Universal Malicious Documents Static Detection Framework Based on Feature Generalization

In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used t...

Full description

Bibliographic Details
Main Authors: Xiaofeng Lu, Fei Wang, Cheng Jiang, Pietro Lio
Format: Article
Language:English
Published: MDPI AG 2021-12-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/11/24/12134
_version_ 1797506697821945856
author Xiaofeng Lu
Fei Wang
Cheng Jiang
Pietro Lio
author_facet Xiaofeng Lu
Fei Wang
Cheng Jiang
Pietro Lio
author_sort Xiaofeng Lu
collection DOAJ
description In this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.
first_indexed 2024-03-10T04:37:10Z
format Article
id doaj.art-f55ead5279534ab88f4085a0ce4ccaac
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T04:37:10Z
publishDate 2021-12-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-f55ead5279534ab88f4085a0ce4ccaac2023-11-23T03:43:43ZengMDPI AGApplied Sciences2076-34172021-12-0111241213410.3390/app112412134A Universal Malicious Documents Static Detection Framework Based on Feature GeneralizationXiaofeng Lu0Fei Wang1Cheng Jiang2Pietro Lio3School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaSchool of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing 100876, ChinaComputer Laboratory, University of Cambridge, Cambridge CB3 0FD, UKIn this study, Portable Document Format (PDF), Word, Excel, Rich Test format (RTF) and image documents are taken as the research objects to study a static and fast method by which to detect malicious documents. Malicious PDF and Word document features are abstracted and extended, which can be used to detect other types of documents. A universal static detection framework for malicious documents based on feature generalization is then proposed. The generalized features include specification check errors, the structure path, code keywords, and the number of objects. The proposed method is verified on two datasets, and is compared with Kaspersky, NOD32, and McAfee antivirus software. The experimental results demonstrate that the proposed method achieves good performance in terms of the detection accuracy, runtime, and scalability. The average F1-score of all types of documents is found to be 0.99, and the average detection time of a document is 0.5926 s, which is at the same level as the compared antivirus software.https://www.mdpi.com/2076-3417/11/24/12134malicious document detectionstatic detectionfeature generalizationmachine learning
spellingShingle Xiaofeng Lu
Fei Wang
Cheng Jiang
Pietro Lio
A Universal Malicious Documents Static Detection Framework Based on Feature Generalization
Applied Sciences
malicious document detection
static detection
feature generalization
machine learning
title A Universal Malicious Documents Static Detection Framework Based on Feature Generalization
title_full A Universal Malicious Documents Static Detection Framework Based on Feature Generalization
title_fullStr A Universal Malicious Documents Static Detection Framework Based on Feature Generalization
title_full_unstemmed A Universal Malicious Documents Static Detection Framework Based on Feature Generalization
title_short A Universal Malicious Documents Static Detection Framework Based on Feature Generalization
title_sort universal malicious documents static detection framework based on feature generalization
topic malicious document detection
static detection
feature generalization
machine learning
url https://www.mdpi.com/2076-3417/11/24/12134
work_keys_str_mv AT xiaofenglu auniversalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization
AT feiwang auniversalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization
AT chengjiang auniversalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization
AT pietrolio auniversalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization
AT xiaofenglu universalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization
AT feiwang universalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization
AT chengjiang universalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization
AT pietrolio universalmaliciousdocumentsstaticdetectionframeworkbasedonfeaturegeneralization