Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities

Understanding how developers refactor their code is critical to support the design improvement process of software. This paper investigates to what extent code metrics are good indicators for predicting refactoring activity in the source code. In order to perform this, we formulated the prediction o...

Full description

Bibliographic Details
Main Authors: Priyadarshni Suresh Sagar, Eman Abdulah AlOmar, Mohamed Wiem Mkaouer, Ali Ouni, Christian D. Newman
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Algorithms
Subjects:
Online Access:https://www.mdpi.com/1999-4893/14/10/289
_version_ 1797515567491448832
author Priyadarshni Suresh Sagar
Eman Abdulah AlOmar
Mohamed Wiem Mkaouer
Ali Ouni
Christian D. Newman
author_facet Priyadarshni Suresh Sagar
Eman Abdulah AlOmar
Mohamed Wiem Mkaouer
Ali Ouni
Christian D. Newman
author_sort Priyadarshni Suresh Sagar
collection DOAJ
description Understanding how developers refactor their code is critical to support the design improvement process of software. This paper investigates to what extent code metrics are good indicators for predicting refactoring activity in the source code. In order to perform this, we formulated the prediction of refactoring operation types as a multi-class classification problem. Our solution relies on measuring metrics extracted from committed code changes in order to extract the corresponding features (i.e., metric variations) that better represent each class (i.e., refactoring type) in order to automatically predict, for a given commit, the method-level type of refactoring being applied, namely <i>Move Method</i>, <i>Rename Method</i>, <i>Extract Method</i>, <i>Inline Method</i>, <i>Pull-up Method</i>, and <i>Push-down Method</i>. We compared various classifiers, in terms of their prediction performance, using a dataset of 5004 commits and extracted 800 Java projects. Our main findings show that the random forest model trained with code metrics resulted in the best average accuracy of 75%. However, we detected a variation in the results per class, which means that some refactoring types are harder to detect than others.
first_indexed 2024-03-10T06:47:14Z
format Article
id doaj.art-5dc933a72ecd424f8d1bb0abe2db482d
institution Directory Open Access Journal
issn 1999-4893
language English
last_indexed 2024-03-10T06:47:14Z
publishDate 2021-09-01
publisher MDPI AG
record_format Article
series Algorithms
spelling doaj.art-5dc933a72ecd424f8d1bb0abe2db482d2023-11-22T17:08:24ZengMDPI AGAlgorithms1999-48932021-09-01141028910.3390/a14100289Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring ActivitiesPriyadarshni Suresh Sagar0Eman Abdulah AlOmar1Mohamed Wiem Mkaouer2Ali Ouni3Christian D. Newman4Rochester Institute of Technology, Rochester, New York, NY 14623, USARochester Institute of Technology, Rochester, New York, NY 14623, USARochester Institute of Technology, Rochester, New York, NY 14623, USAEcole de Technologie Superieure, University of Quebec, Quebec City, QC H3C 1K3, CanadaRochester Institute of Technology, Rochester, New York, NY 14623, USAUnderstanding how developers refactor their code is critical to support the design improvement process of software. This paper investigates to what extent code metrics are good indicators for predicting refactoring activity in the source code. In order to perform this, we formulated the prediction of refactoring operation types as a multi-class classification problem. Our solution relies on measuring metrics extracted from committed code changes in order to extract the corresponding features (i.e., metric variations) that better represent each class (i.e., refactoring type) in order to automatically predict, for a given commit, the method-level type of refactoring being applied, namely <i>Move Method</i>, <i>Rename Method</i>, <i>Extract Method</i>, <i>Inline Method</i>, <i>Pull-up Method</i>, and <i>Push-down Method</i>. We compared various classifiers, in terms of their prediction performance, using a dataset of 5004 commits and extracted 800 Java projects. Our main findings show that the random forest model trained with code metrics resulted in the best average accuracy of 75%. However, we detected a variation in the results per class, which means that some refactoring types are harder to detect than others.https://www.mdpi.com/1999-4893/14/10/289refactoringsoftware qualitycommitssoftware metricssoftware engineering
spellingShingle Priyadarshni Suresh Sagar
Eman Abdulah AlOmar
Mohamed Wiem Mkaouer
Ali Ouni
Christian D. Newman
Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities
Algorithms
refactoring
software quality
commits
software metrics
software engineering
title Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities
title_full Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities
title_fullStr Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities
title_full_unstemmed Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities
title_short Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities
title_sort comparing commit messages and source code metrics for the prediction refactoring activities
topic refactoring
software quality
commits
software metrics
software engineering
url https://www.mdpi.com/1999-4893/14/10/289
work_keys_str_mv AT priyadarshnisureshsagar comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities
AT emanabdulahalomar comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities
AT mohamedwiemmkaouer comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities
AT aliouni comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities
AT christiandnewman comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities