Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms
Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known se...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
ITB Journal Publisher
2022-05-01
|
Series: | Journal of ICT Research and Applications |
Subjects: | |
Online Access: | https://journals.itb.ac.id/index.php/jictra/article/view/16905 |
_version_ | 1828553230741667840 |
---|---|
author | Do Xuan Cho Vu Ngoc Son Duong Duc |
author_facet | Do Xuan Cho Vu Ngoc Son Duong Duc |
author_sort | Do Xuan Cho |
collection | DOAJ |
description |
Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.
|
first_indexed | 2024-12-12T05:14:55Z |
format | Article |
id | doaj.art-55047bec08db45cfbff1b160fe1efa1e |
institution | Directory Open Access Journal |
issn | 2337-5787 2338-5499 |
language | English |
last_indexed | 2024-12-12T05:14:55Z |
publishDate | 2022-05-01 |
publisher | ITB Journal Publisher |
record_format | Article |
series | Journal of ICT Research and Applications |
spelling | doaj.art-55047bec08db45cfbff1b160fe1efa1e2022-12-22T00:36:48ZengITB Journal PublisherJournal of ICT Research and Applications2337-57872338-54992022-05-0116110.5614/itbj.ict.res.appl.2022.16.1.5Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning AlgorithmsDo Xuan Cho0Vu Ngoc Son1Duong Duc2Faculty of Information Assurance, Posts and Telecommunications Institute of Technology, Hanoi, VietnamInformation Assurance Departement, FPT University, Hanoi, VietnamInformation Assurance Departement, FPT University, Hanoi, Vietnam Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm. https://journals.itb.ac.id/index.php/jictra/article/view/16905machine learning algorithmsnatural language processing techniquessoftware security vulnerability detectionsoftware vulnerabilitiessource code features |
spellingShingle | Do Xuan Cho Vu Ngoc Son Duong Duc Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms Journal of ICT Research and Applications machine learning algorithms natural language processing techniques software security vulnerability detection software vulnerabilities source code features |
title | Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms |
title_full | Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms |
title_fullStr | Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms |
title_full_unstemmed | Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms |
title_short | Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms |
title_sort | automatically detect software security vulnerabilities based on natural language processing techniques and machine learning algorithms |
topic | machine learning algorithms natural language processing techniques software security vulnerability detection software vulnerabilities source code features |
url | https://journals.itb.ac.id/index.php/jictra/article/view/16905 |
work_keys_str_mv | AT doxuancho automaticallydetectsoftwaresecurityvulnerabilitiesbasedonnaturallanguageprocessingtechniquesandmachinelearningalgorithms AT vungocson automaticallydetectsoftwaresecurityvulnerabilitiesbasedonnaturallanguageprocessingtechniquesandmachinelearningalgorithms AT duongduc automaticallydetectsoftwaresecurityvulnerabilitiesbasedonnaturallanguageprocessingtechniquesandmachinelearningalgorithms |