Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences

Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a giv...

Full description

Bibliographic Details
Main Authors:	Jiwon Hong, Dongho Jeong, Sang-Wook Kim
Format:	Article
Language:	English
Published:	MDPI AG 2022-04-01
Series:	Applied Sciences
Subjects:	malware malicious document classification text analysis
Online Access:	https://www.mdpi.com/2076-3417/12/8/4088

Description
Summary:	Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents.
ISSN:	2076-3417

Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences

Similar Items