Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences

Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a giv...

Full description

Bibliographic Details
Main Authors:	Jiwon Hong, Dongho Jeong, Sang-Wook Kim
Format:	Article
Language:	English
Published:	MDPI AG 2022-04-01
Series:	Applied Sciences
Subjects:	malware malicious document classification text analysis
Online Access:	https://www.mdpi.com/2076-3417/12/8/4088

_version_	1797436975745073152
author	Jiwon Hong Dongho Jeong Sang-Wook Kim
author_facet	Jiwon Hong Dongho Jeong Sang-Wook Kim
author_sort	Jiwon Hong
collection	DOAJ
description	Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents.
first_indexed	2024-03-09T11:11:22Z
format	Article
id	doaj.art-b8e8a41d1d2240b79b88a299f455defb
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-03-09T11:11:22Z
publishDate	2022-04-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-b8e8a41d1d2240b79b88a299f455defb2023-12-01T00:45:06ZengMDPI AGApplied Sciences2076-34172022-04-01128408810.3390/app12084088Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and ExperiencesJiwon Hong0Dongho Jeong1Sang-Wook Kim2Department of Computer Science, Hanyang University, Seoul 04763, KoreaDepartment of Artificial Intelligence, Hanyang University, Seoul 04763, KoreaDepartment of Computer Science, Hanyang University, Seoul 04763, KoreaCyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents.https://www.mdpi.com/2076-3417/12/8/4088malwaremalicious documentclassificationtext analysis
spellingShingle	Jiwon Hong Dongho Jeong Sang-Wook Kim Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences Applied Sciences malware malicious document classification text analysis
title	Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_full	Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_fullStr	Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_full_unstemmed	Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_short	Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_sort	classifying malicious documents on the basis of plain text features problem solution and experiences
topic	malware malicious document classification text analysis
url	https://www.mdpi.com/2076-3417/12/8/4088
work_keys_str_mv	AT jiwonhong classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences AT donghojeong classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences AT sangwookkim classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences

Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences

Similar Items