Detection of code smells using machine learning techniques combined with data-balancing methods

Code smells are prevalent issues in software design that arise when implementation or design principles are violated. These issues manifest as symptoms or anomalies in the source code. Timely identification of code smells plays a crucial role in enhancing software quality and facilitating software m...

Full description

Bibliographic Details
Main Authors: Nasraldeen Alnor Adam Khleel, Károly Nehéz
Format: Article
Language:English
Published: Universitas Ahmad Dahlan 2023-11-01
Series:IJAIN (International Journal of Advances in Intelligent Informatics)
Subjects:
Online Access:http://ijain.org/index.php/IJAIN/article/view/981
_version_ 1797658214181896192
author Nasraldeen Alnor Adam Khleel
Károly Nehéz
author_facet Nasraldeen Alnor Adam Khleel
Károly Nehéz
author_sort Nasraldeen Alnor Adam Khleel
collection DOAJ
description Code smells are prevalent issues in software design that arise when implementation or design principles are violated. These issues manifest as symptoms or anomalies in the source code. Timely identification of code smells plays a crucial role in enhancing software quality and facilitating software maintenance. Previous studies have shown that code smell detection can be accomplished through the utilization of machine learning (ML) methods. However, despite their increasing popularity, research suggests that the suitability of these methods are not always appropriate due to the problem of imbalanced data. Consequently, the effectiveness of ML models may be negatively affected. This study aims to propose a novel method for detecting code smells by employing five ML algorithms, namely decision tree (DT), k-nearest neighbors (K-NN), support vector machine (SVM), XGboost (XGB), and multi-layer perceptron (MLP). Additionally, to tackle the challenge of imbalanced data, the proposed method incorporates the random oversampling technique. Experiments were conducted in this study using four datasets that encompassed code smells, specifically god-class, data-class, long-method, and feature-envy. The experimental outcomes were evaluated and compared using various performance metrics. Upon comparing the outcomes of our models on both the balanced and original datasets, we found that the XGB model achieved the highest accuracy of 100% for detecting the data class and long method on the original datasets. In contrast, the highest accuracy of 100% was obtained for the data class and long method using DT, SVM, and XGB models on the balanced datasets. According to the empirical findings, there is significant promise in using ML techniques for the accurate prediction of code smells.
first_indexed 2024-03-11T17:55:59Z
format Article
id doaj.art-4a97a22034564c07b7a9a25cad163faa
institution Directory Open Access Journal
issn 2442-6571
2548-3161
language English
last_indexed 2024-03-11T17:55:59Z
publishDate 2023-11-01
publisher Universitas Ahmad Dahlan
record_format Article
series IJAIN (International Journal of Advances in Intelligent Informatics)
spelling doaj.art-4a97a22034564c07b7a9a25cad163faa2023-10-17T12:12:18ZengUniversitas Ahmad DahlanIJAIN (International Journal of Advances in Intelligent Informatics)2442-65712548-31612023-11-019340241710.26555/ijain.v9i3.981249Detection of code smells using machine learning techniques combined with data-balancing methodsNasraldeen Alnor Adam Khleel0Károly Nehéz1University of MiskolcUniversity of MiskolcCode smells are prevalent issues in software design that arise when implementation or design principles are violated. These issues manifest as symptoms or anomalies in the source code. Timely identification of code smells plays a crucial role in enhancing software quality and facilitating software maintenance. Previous studies have shown that code smell detection can be accomplished through the utilization of machine learning (ML) methods. However, despite their increasing popularity, research suggests that the suitability of these methods are not always appropriate due to the problem of imbalanced data. Consequently, the effectiveness of ML models may be negatively affected. This study aims to propose a novel method for detecting code smells by employing five ML algorithms, namely decision tree (DT), k-nearest neighbors (K-NN), support vector machine (SVM), XGboost (XGB), and multi-layer perceptron (MLP). Additionally, to tackle the challenge of imbalanced data, the proposed method incorporates the random oversampling technique. Experiments were conducted in this study using four datasets that encompassed code smells, specifically god-class, data-class, long-method, and feature-envy. The experimental outcomes were evaluated and compared using various performance metrics. Upon comparing the outcomes of our models on both the balanced and original datasets, we found that the XGB model achieved the highest accuracy of 100% for detecting the data class and long method on the original datasets. In contrast, the highest accuracy of 100% was obtained for the data class and long method using DT, SVM, and XGB models on the balanced datasets. According to the empirical findings, there is significant promise in using ML techniques for the accurate prediction of code smells.http://ijain.org/index.php/IJAIN/article/view/981code smellssoftware metricsmachine learning techniquesclass imbalancedata balancing methods
spellingShingle Nasraldeen Alnor Adam Khleel
Károly Nehéz
Detection of code smells using machine learning techniques combined with data-balancing methods
IJAIN (International Journal of Advances in Intelligent Informatics)
code smells
software metrics
machine learning techniques
class imbalance
data balancing methods
title Detection of code smells using machine learning techniques combined with data-balancing methods
title_full Detection of code smells using machine learning techniques combined with data-balancing methods
title_fullStr Detection of code smells using machine learning techniques combined with data-balancing methods
title_full_unstemmed Detection of code smells using machine learning techniques combined with data-balancing methods
title_short Detection of code smells using machine learning techniques combined with data-balancing methods
title_sort detection of code smells using machine learning techniques combined with data balancing methods
topic code smells
software metrics
machine learning techniques
class imbalance
data balancing methods
url http://ijain.org/index.php/IJAIN/article/view/981
work_keys_str_mv AT nasraldeenalnoradamkhleel detectionofcodesmellsusingmachinelearningtechniquescombinedwithdatabalancingmethods
AT karolynehez detectionofcodesmellsusingmachinelearningtechniquescombinedwithdatabalancingmethods