Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences

Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a giv...

Full description

Bibliographic Details
Main Authors: Jiwon Hong, Dongho Jeong, Sang-Wook Kim
Format: Article
Language:English
Published: MDPI AG 2022-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/12/8/4088
_version_ 1797436975745073152
author Jiwon Hong
Dongho Jeong
Sang-Wook Kim
author_facet Jiwon Hong
Dongho Jeong
Sang-Wook Kim
author_sort Jiwon Hong
collection DOAJ
description Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents.
first_indexed 2024-03-09T11:11:22Z
format Article
id doaj.art-b8e8a41d1d2240b79b88a299f455defb
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-09T11:11:22Z
publishDate 2022-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-b8e8a41d1d2240b79b88a299f455defb2023-12-01T00:45:06ZengMDPI AGApplied Sciences2076-34172022-04-01128408810.3390/app12084088Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and ExperiencesJiwon Hong0Dongho Jeong1Sang-Wook Kim2Department of Computer Science, Hanyang University, Seoul 04763, KoreaDepartment of Artificial Intelligence, Hanyang University, Seoul 04763, KoreaDepartment of Computer Science, Hanyang University, Seoul 04763, KoreaCyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents.https://www.mdpi.com/2076-3417/12/8/4088malwaremalicious documentclassificationtext analysis
spellingShingle Jiwon Hong
Dongho Jeong
Sang-Wook Kim
Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
Applied Sciences
malware
malicious document
classification
text analysis
title Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_full Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_fullStr Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_full_unstemmed Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_short Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
title_sort classifying malicious documents on the basis of plain text features problem solution and experiences
topic malware
malicious document
classification
text analysis
url https://www.mdpi.com/2076-3417/12/8/4088
work_keys_str_mv AT jiwonhong classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences
AT donghojeong classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences
AT sangwookkim classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences