Summary: | The average software company spends a huge amount of its revenue on R&D for how to deliver software on time. Accurate software effort estimation is critical for successful project planning, resource allocation, and on-time delivery within budget for sustainable software development. However, both overestimation and underestimation pose significant challenges in software development, necessitating continuous improvement in estimation techniques. This study reviews recent machine learning approaches exploited to enhance software effort estimation (SEE) accuracy, focusing on research published between 2020 and 2023. The literature review employed an approach to identify pertinent research on machine learning techniques for software estimation efforts. Additionally, comparative experiments were conducted employing five commonly used ML methods: K-Nearest Neighbor, Support Vector Machine, Random Forest, Logistic Regression, and LASSO Regression. These techniques were assessed using five widely employed accuracy metrics such as Mean Squared Error (MSE), Mean Magnitude of Relative Error (MMRE), R-squared, Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) on seven benchmark datasets (Albrecht, Desharnais, China, Kemerer, Mayazaki94, Maxwell, COCOMO). By carefully reviewing study quality, analyzing results across the literature, and rigorously evaluating experimental outcomes, clear conclusions were drawn about the most promising techniques for achieving state-of-the-art accuracy in estimating software effort. This study makes three key contributions to the field: firstly, it furnishes a thorough overview of recent machine learning research in software effort estimation (SEE); secondly, it provides data-driven guidance for researchers and practitioners to select optimal methods for accurate effort estimation; and thirdly, it demonstrates the performance of publicly available datasets through experimental analysis. Enhanced estimation supports the development of better predictive models for software project time, cost, and staffing needs. The findings aim to focus future research directions and tool development toward the most accurate machine learning approaches for modeling software development effort, costs, and delivery schedules.
|