Dimensionality reduction and machine learning based model of software cost estimation

Software Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low p...

Full description

Bibliographic Details
Main Authors: Wei Zhang, Haixin Cheng, Siyu Zhan, Ming Luo, Feng Wang, Zhan Huang
Format: Article
Language:English
Published: Frontiers Media S.A. 2024-03-01
Series:Frontiers in Physics
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fphy.2024.1324719/full
_version_ 1797265498284490752
author Wei Zhang
Haixin Cheng
Haixin Cheng
Siyu Zhan
Siyu Zhan
Ming Luo
Feng Wang
Zhan Huang
author_facet Wei Zhang
Haixin Cheng
Haixin Cheng
Siyu Zhan
Siyu Zhan
Ming Luo
Feng Wang
Zhan Huang
author_sort Wei Zhang
collection DOAJ
description Software Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low prediction accuracy, poor robustness, and poor interpretability in SCE, this paper proposes a SCE model based on Autoencoder and Random Forest. First, preprocess the project data, remove outliers, and build regression trees to fill in missing attributes in the data. Second, construct a Autoencoder to reduce the dimensionality of factors that affect software cost. Subsequently, the performance of the model was trained and validated using the XGBoost framework on three datasets: COCOMO81, Albrecht, and Desharnais, and compared with common cost prediction models. The experimental results show that the MMRE, MdMRE, and PRED (0.25) values of the proposed model on the COCOMO81 dataset reached 0.21, 0.16, and 0.71, respectively. Compared with other models, the proposed model achieved significant improvements in accuracy and robustness.
first_indexed 2024-04-25T00:45:45Z
format Article
id doaj.art-eff7b121c5d9448bb7ca914c159da948
institution Directory Open Access Journal
issn 2296-424X
language English
last_indexed 2024-04-25T00:45:45Z
publishDate 2024-03-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Physics
spelling doaj.art-eff7b121c5d9448bb7ca914c159da9482024-03-12T04:59:12ZengFrontiers Media S.A.Frontiers in Physics2296-424X2024-03-011210.3389/fphy.2024.13247191324719Dimensionality reduction and machine learning based model of software cost estimationWei Zhang0Haixin Cheng1Haixin Cheng2Siyu Zhan3Siyu Zhan4Ming Luo5Feng Wang6Zhan Huang7Research Institute of Natural Gas Gathering and Transmission Engineering Technology, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaLaboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, Chengdu, ChinaTrusted Cloud Computing and Big Data Key Laboratory of Sichuan Province, Chengdu, ChinaLaboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, Chengdu, ChinaTrusted Cloud Computing and Big Data Key Laboratory of Sichuan Province, Chengdu, ChinaCapital Construction Department, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaResearch Institute of Natural Gas Gathering and Transmission Engineering Technology, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaCapital Construction Department, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaSoftware Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low prediction accuracy, poor robustness, and poor interpretability in SCE, this paper proposes a SCE model based on Autoencoder and Random Forest. First, preprocess the project data, remove outliers, and build regression trees to fill in missing attributes in the data. Second, construct a Autoencoder to reduce the dimensionality of factors that affect software cost. Subsequently, the performance of the model was trained and validated using the XGBoost framework on three datasets: COCOMO81, Albrecht, and Desharnais, and compared with common cost prediction models. The experimental results show that the MMRE, MdMRE, and PRED (0.25) values of the proposed model on the COCOMO81 dataset reached 0.21, 0.16, and 0.71, respectively. Compared with other models, the proposed model achieved significant improvements in accuracy and robustness.https://www.frontiersin.org/articles/10.3389/fphy.2024.1324719/fullsoftware cost estimationAutoencoderrandom forestCOCOMOdimensionality reduction
spellingShingle Wei Zhang
Haixin Cheng
Haixin Cheng
Siyu Zhan
Siyu Zhan
Ming Luo
Feng Wang
Zhan Huang
Dimensionality reduction and machine learning based model of software cost estimation
Frontiers in Physics
software cost estimation
Autoencoder
random forest
COCOMO
dimensionality reduction
title Dimensionality reduction and machine learning based model of software cost estimation
title_full Dimensionality reduction and machine learning based model of software cost estimation
title_fullStr Dimensionality reduction and machine learning based model of software cost estimation
title_full_unstemmed Dimensionality reduction and machine learning based model of software cost estimation
title_short Dimensionality reduction and machine learning based model of software cost estimation
title_sort dimensionality reduction and machine learning based model of software cost estimation
topic software cost estimation
Autoencoder
random forest
COCOMO
dimensionality reduction
url https://www.frontiersin.org/articles/10.3389/fphy.2024.1324719/full
work_keys_str_mv AT weizhang dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation
AT haixincheng dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation
AT haixincheng dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation
AT siyuzhan dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation
AT siyuzhan dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation
AT mingluo dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation
AT fengwang dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation
AT zhanhuang dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation