Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach

Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the perform...

Full description

Bibliographic Details
Main Authors:	Abdullateef Oluwagbemiga Balogun, Shuib Basri, Said Jadid Abdulkadir, Ahmad Sobri Hashim
Format:	Article
Language:	English
Published:	MDPI AG 2019-07-01
Series:	Applied Sciences
Subjects:	software defect prediction feature selection high dimensionality search methods
Online Access:	https://www.mdpi.com/2076-3417/9/13/2764

_version_	1818174043822489600
author	Abdullateef Oluwagbemiga Balogun Shuib Basri Said Jadid Abdulkadir Ahmad Sobri Hashim
author_facet	Abdullateef Oluwagbemiga Balogun Shuib Basri Said Jadid Abdulkadir Ahmad Sobri Hashim
author_sort	Abdullateef Oluwagbemiga Balogun
collection	DOAJ
description	Software Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature subset selection (FSS) methods were evaluated using four different classifiers over five software defect datasets obtained from the National Aeronautics and Space Administration (NASA) repository. The experimental analysis showed that the application of FS improves the predictive performance of classifiers and the performance of FS methods can vary across datasets and classifiers. In the FFR methods, Information Gain demonstrated the greatest improvements in the performance of the prediction models. In FSS methods, Consistency Feature Subset Selection based on Best First Search had the best influence on the prediction models. However, prediction models based on FFR proved to be more stable than those based on FSS methods. Hence, we conclude that FS methods improve the performance of SDP models, and that there is no single best FS method, as their performance varied according to datasets and the choice of the prediction model. However, we recommend the use of FFR methods as the prediction models based on FFR are more stable in terms of predictive performance.
first_indexed	2024-12-11T19:38:08Z
format	Article
id	doaj.art-103efb6f3ba546ab9096ac7801da638b
institution	Directory Open Access Journal
issn	2076-3417
language	English
last_indexed	2024-12-11T19:38:08Z
publishDate	2019-07-01
publisher	MDPI AG
record_format	Article
series	Applied Sciences
spelling	doaj.art-103efb6f3ba546ab9096ac7801da638b2022-12-22T00:53:05ZengMDPI AGApplied Sciences2076-34172019-07-01913276410.3390/app9132764app9132764Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method ApproachAbdullateef Oluwagbemiga Balogun0Shuib Basri1Said Jadid Abdulkadir2Ahmad Sobri Hashim3Department of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak 32610, MalaysiaDepartment of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak 32610, MalaysiaDepartment of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak 32610, MalaysiaDepartment of Computer and Information Sciences, Universiti Teknologi PETRONAS, Perak 32610, MalaysiaSoftware Defect Prediction (SDP) models are built using software metrics derived from software systems. The quality of SDP models depends largely on the quality of software metrics (dataset) used to build the SDP models. High dimensionality is one of the data quality problems that affect the performance of SDP models. Feature selection (FS) is a proven method for addressing the dimensionality problem. However, the choice of FS method for SDP is still a problem, as most of the empirical studies on FS methods for SDP produce contradictory and inconsistent quality outcomes. Those FS methods behave differently due to different underlining computational characteristics. This could be due to the choices of search methods used in FS because the impact of FS depends on the choice of search method. It is hence imperative to comparatively analyze the FS methods performance based on different search methods in SDP. In this paper, four filter feature ranking (FFR) and fourteen filter feature subset selection (FSS) methods were evaluated using four different classifiers over five software defect datasets obtained from the National Aeronautics and Space Administration (NASA) repository. The experimental analysis showed that the application of FS improves the predictive performance of classifiers and the performance of FS methods can vary across datasets and classifiers. In the FFR methods, Information Gain demonstrated the greatest improvements in the performance of the prediction models. In FSS methods, Consistency Feature Subset Selection based on Best First Search had the best influence on the prediction models. However, prediction models based on FFR proved to be more stable than those based on FSS methods. Hence, we conclude that FS methods improve the performance of SDP models, and that there is no single best FS method, as their performance varied according to datasets and the choice of the prediction model. However, we recommend the use of FFR methods as the prediction models based on FFR are more stable in terms of predictive performance.https://www.mdpi.com/2076-3417/9/13/2764software defect predictionfeature selectionhigh dimensionalitysearch methods
spellingShingle	Abdullateef Oluwagbemiga Balogun Shuib Basri Said Jadid Abdulkadir Ahmad Sobri Hashim Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach Applied Sciences software defect prediction feature selection high dimensionality search methods
title	Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
title_full	Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
title_fullStr	Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
title_full_unstemmed	Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
title_short	Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach
title_sort	performance analysis of feature selection methods in software defect prediction a search method approach
topic	software defect prediction feature selection high dimensionality search methods
url	https://www.mdpi.com/2076-3417/9/13/2764
work_keys_str_mv	AT abdullateefoluwagbemigabalogun performanceanalysisoffeatureselectionmethodsinsoftwaredefectpredictionasearchmethodapproach AT shuibbasri performanceanalysisoffeatureselectionmethodsinsoftwaredefectpredictionasearchmethodapproach AT saidjadidabdulkadir performanceanalysisoffeatureselectionmethodsinsoftwaredefectpredictionasearchmethodapproach AT ahmadsobrihashim performanceanalysisoffeatureselectionmethodsinsoftwaredefectpredictionasearchmethodapproach

Performance Analysis of Feature Selection Methods in Software Defect Prediction: A Search Method Approach

Similar Items