Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms

Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known se...

Full description

Bibliographic Details
Main Authors: Do Xuan Cho, Vu Ngoc Son, Duong Duc
Format: Article
Language:English
Published: ITB Journal Publisher 2022-05-01
Series:Journal of ICT Research and Applications
Subjects:
Online Access:https://journals.itb.ac.id/index.php/jictra/article/view/16905
_version_ 1828553230741667840
author Do Xuan Cho
Vu Ngoc Son
Duong Duc
author_facet Do Xuan Cho
Vu Ngoc Son
Duong Duc
author_sort Do Xuan Cho
collection DOAJ
description Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.
first_indexed 2024-12-12T05:14:55Z
format Article
id doaj.art-55047bec08db45cfbff1b160fe1efa1e
institution Directory Open Access Journal
issn 2337-5787
2338-5499
language English
last_indexed 2024-12-12T05:14:55Z
publishDate 2022-05-01
publisher ITB Journal Publisher
record_format Article
series Journal of ICT Research and Applications
spelling doaj.art-55047bec08db45cfbff1b160fe1efa1e2022-12-22T00:36:48ZengITB Journal PublisherJournal of ICT Research and Applications2337-57872338-54992022-05-0116110.5614/itbj.ict.res.appl.2022.16.1.5Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning AlgorithmsDo Xuan Cho0Vu Ngoc Son1Duong Duc2Faculty of Information Assurance, Posts and Telecommunications Institute of Technology, Hanoi, VietnamInformation Assurance Departement, FPT University, Hanoi, VietnamInformation Assurance Departement, FPT University, Hanoi, Vietnam Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm. https://journals.itb.ac.id/index.php/jictra/article/view/16905machine learning algorithmsnatural language processing techniquessoftware security vulnerability detectionsoftware vulnerabilitiessource code features
spellingShingle Do Xuan Cho
Vu Ngoc Son
Duong Duc
Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
Journal of ICT Research and Applications
machine learning algorithms
natural language processing techniques
software security vulnerability detection
software vulnerabilities
source code features
title Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
title_full Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
title_fullStr Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
title_full_unstemmed Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
title_short Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
title_sort automatically detect software security vulnerabilities based on natural language processing techniques and machine learning algorithms
topic machine learning algorithms
natural language processing techniques
software security vulnerability detection
software vulnerabilities
source code features
url https://journals.itb.ac.id/index.php/jictra/article/view/16905
work_keys_str_mv AT doxuancho automaticallydetectsoftwaresecurityvulnerabilitiesbasedonnaturallanguageprocessingtechniquesandmachinelearningalgorithms
AT vungocson automaticallydetectsoftwaresecurityvulnerabilitiesbasedonnaturallanguageprocessingtechniquesandmachinelearningalgorithms
AT duongduc automaticallydetectsoftwaresecurityvulnerabilitiesbasedonnaturallanguageprocessingtechniquesandmachinelearningalgorithms