Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities
Understanding how developers refactor their code is critical to support the design improvement process of software. This paper investigates to what extent code metrics are good indicators for predicting refactoring activity in the source code. In order to perform this, we formulated the prediction o...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Algorithms |
Subjects: | |
Online Access: | https://www.mdpi.com/1999-4893/14/10/289 |
_version_ | 1797515567491448832 |
---|---|
author | Priyadarshni Suresh Sagar Eman Abdulah AlOmar Mohamed Wiem Mkaouer Ali Ouni Christian D. Newman |
author_facet | Priyadarshni Suresh Sagar Eman Abdulah AlOmar Mohamed Wiem Mkaouer Ali Ouni Christian D. Newman |
author_sort | Priyadarshni Suresh Sagar |
collection | DOAJ |
description | Understanding how developers refactor their code is critical to support the design improvement process of software. This paper investigates to what extent code metrics are good indicators for predicting refactoring activity in the source code. In order to perform this, we formulated the prediction of refactoring operation types as a multi-class classification problem. Our solution relies on measuring metrics extracted from committed code changes in order to extract the corresponding features (i.e., metric variations) that better represent each class (i.e., refactoring type) in order to automatically predict, for a given commit, the method-level type of refactoring being applied, namely <i>Move Method</i>, <i>Rename Method</i>, <i>Extract Method</i>, <i>Inline Method</i>, <i>Pull-up Method</i>, and <i>Push-down Method</i>. We compared various classifiers, in terms of their prediction performance, using a dataset of 5004 commits and extracted 800 Java projects. Our main findings show that the random forest model trained with code metrics resulted in the best average accuracy of 75%. However, we detected a variation in the results per class, which means that some refactoring types are harder to detect than others. |
first_indexed | 2024-03-10T06:47:14Z |
format | Article |
id | doaj.art-5dc933a72ecd424f8d1bb0abe2db482d |
institution | Directory Open Access Journal |
issn | 1999-4893 |
language | English |
last_indexed | 2024-03-10T06:47:14Z |
publishDate | 2021-09-01 |
publisher | MDPI AG |
record_format | Article |
series | Algorithms |
spelling | doaj.art-5dc933a72ecd424f8d1bb0abe2db482d2023-11-22T17:08:24ZengMDPI AGAlgorithms1999-48932021-09-01141028910.3390/a14100289Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring ActivitiesPriyadarshni Suresh Sagar0Eman Abdulah AlOmar1Mohamed Wiem Mkaouer2Ali Ouni3Christian D. Newman4Rochester Institute of Technology, Rochester, New York, NY 14623, USARochester Institute of Technology, Rochester, New York, NY 14623, USARochester Institute of Technology, Rochester, New York, NY 14623, USAEcole de Technologie Superieure, University of Quebec, Quebec City, QC H3C 1K3, CanadaRochester Institute of Technology, Rochester, New York, NY 14623, USAUnderstanding how developers refactor their code is critical to support the design improvement process of software. This paper investigates to what extent code metrics are good indicators for predicting refactoring activity in the source code. In order to perform this, we formulated the prediction of refactoring operation types as a multi-class classification problem. Our solution relies on measuring metrics extracted from committed code changes in order to extract the corresponding features (i.e., metric variations) that better represent each class (i.e., refactoring type) in order to automatically predict, for a given commit, the method-level type of refactoring being applied, namely <i>Move Method</i>, <i>Rename Method</i>, <i>Extract Method</i>, <i>Inline Method</i>, <i>Pull-up Method</i>, and <i>Push-down Method</i>. We compared various classifiers, in terms of their prediction performance, using a dataset of 5004 commits and extracted 800 Java projects. Our main findings show that the random forest model trained with code metrics resulted in the best average accuracy of 75%. However, we detected a variation in the results per class, which means that some refactoring types are harder to detect than others.https://www.mdpi.com/1999-4893/14/10/289refactoringsoftware qualitycommitssoftware metricssoftware engineering |
spellingShingle | Priyadarshni Suresh Sagar Eman Abdulah AlOmar Mohamed Wiem Mkaouer Ali Ouni Christian D. Newman Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities Algorithms refactoring software quality commits software metrics software engineering |
title | Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities |
title_full | Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities |
title_fullStr | Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities |
title_full_unstemmed | Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities |
title_short | Comparing Commit Messages and Source Code Metrics for the Prediction Refactoring Activities |
title_sort | comparing commit messages and source code metrics for the prediction refactoring activities |
topic | refactoring software quality commits software metrics software engineering |
url | https://www.mdpi.com/1999-4893/14/10/289 |
work_keys_str_mv | AT priyadarshnisureshsagar comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities AT emanabdulahalomar comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities AT mohamedwiemmkaouer comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities AT aliouni comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities AT christiandnewman comparingcommitmessagesandsourcecodemetricsforthepredictionrefactoringactivities |