Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents

This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjust...

Full description

Bibliographic Details
Main Author: Tiberiu-Marian Georgescu
Format: Article
Language:English
Published: MDPI AG 2020-03-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/12/3/354
_version_ 1798005486551826432
author Tiberiu-Marian Georgescu
author_facet Tiberiu-Marian Georgescu
author_sort Tiberiu-Marian Georgescu
collection DOAJ
description This paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.
first_indexed 2024-04-11T12:41:09Z
format Article
id doaj.art-9cad5db014a2487f874550122067db4d
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-04-11T12:41:09Z
publishDate 2020-03-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-9cad5db014a2487f874550122067db4d2022-12-22T04:23:30ZengMDPI AGSymmetry2073-89942020-03-0112335410.3390/sym12030354sym12030354Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related DocumentsTiberiu-Marian Georgescu0Department of Economic Informatics and Cybernetics, The Bucharest University of Economic Studies, 6 Piata Romana, 010374 Bucharest, RomaniaThis paper describes the development and implementation of a natural language processing model based on machine learning which performs cognitive analysis for cybersecurity-related documents. A domain ontology was developed using a two-step approach: (1) the symmetry stage and (2) the machine adjustment. The first stage is based on the symmetry between the way humans represent a domain and the way machine learning solutions do. Therefore, the cybersecurity field was initially modeled based on the expertise of cybersecurity professionals. A dictionary of relevant entities was created; the entities were classified into 29 categories and later implemented as classes in a natural language processing model based on machine learning. After running successive performance tests, the ontology was remodeled from 29 to 18 classes. Using the ontology, a natural language processing model based on a supervised learning model was defined. We trained the model using sets of approximately 300,000 words. Remarkably, our model obtained an F1 score of 0.81 for named entity recognition and 0.58 for relation extraction, showing superior results compared to other similar models identified in the literature. Furthermore, in order to be easily used and tested, a web application that integrates our model as the core component was developed.https://www.mdpi.com/2073-8994/12/3/354cybersecuritymachine learningontologiesnamed entity recognitionnatural language processingrelation extraction
spellingShingle Tiberiu-Marian Georgescu
Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents
Symmetry
cybersecurity
machine learning
ontologies
named entity recognition
natural language processing
relation extraction
title Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents
title_full Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents
title_fullStr Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents
title_full_unstemmed Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents
title_short Natural Language Processing Model for Automatic Analysis of Cybersecurity-Related Documents
title_sort natural language processing model for automatic analysis of cybersecurity related documents
topic cybersecurity
machine learning
ontologies
named entity recognition
natural language processing
relation extraction
url https://www.mdpi.com/2073-8994/12/3/354
work_keys_str_mv AT tiberiumariangeorgescu naturallanguageprocessingmodelforautomaticanalysisofcybersecurityrelateddocuments