Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences
Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a giv...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/12/8/4088 |
_version_ | 1797436975745073152 |
---|---|
author | Jiwon Hong Dongho Jeong Sang-Wook Kim |
author_facet | Jiwon Hong Dongho Jeong Sang-Wook Kim |
author_sort | Jiwon Hong |
collection | DOAJ |
description | Cyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents. |
first_indexed | 2024-03-09T11:11:22Z |
format | Article |
id | doaj.art-b8e8a41d1d2240b79b88a299f455defb |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-09T11:11:22Z |
publishDate | 2022-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-b8e8a41d1d2240b79b88a299f455defb2023-12-01T00:45:06ZengMDPI AGApplied Sciences2076-34172022-04-01128408810.3390/app12084088Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and ExperiencesJiwon Hong0Dongho Jeong1Sang-Wook Kim2Department of Computer Science, Hanyang University, Seoul 04763, KoreaDepartment of Artificial Intelligence, Hanyang University, Seoul 04763, KoreaDepartment of Computer Science, Hanyang University, Seoul 04763, KoreaCyberattacks widely occur by using malicious documents. A malicious document is an electronic document containing malicious codes along with some plain-text data that is human-readable. In this paper, we propose a novel framework that takes advantage of such plaintext data to determine whether a given document is malicious. We extracted plaintext features from the corpus of electronic documents and utilized them to train a classification model for detecting malicious documents. Our extensive experimental results with different combinations of three well-known vectorization strategies and three popular classification methods on five types of electronic documents demonstrate that our framework provides high prediction accuracy in detecting malicious documents.https://www.mdpi.com/2076-3417/12/8/4088malwaremalicious documentclassificationtext analysis |
spellingShingle | Jiwon Hong Dongho Jeong Sang-Wook Kim Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences Applied Sciences malware malicious document classification text analysis |
title | Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences |
title_full | Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences |
title_fullStr | Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences |
title_full_unstemmed | Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences |
title_short | Classifying Malicious Documents on the Basis of Plain-Text Features: Problem, Solution, and Experiences |
title_sort | classifying malicious documents on the basis of plain text features problem solution and experiences |
topic | malware malicious document classification text analysis |
url | https://www.mdpi.com/2076-3417/12/8/4088 |
work_keys_str_mv | AT jiwonhong classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences AT donghojeong classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences AT sangwookkim classifyingmaliciousdocumentsonthebasisofplaintextfeaturesproblemsolutionandexperiences |