Dimensionality reduction and machine learning based model of software cost estimation
Software Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low p...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2024-03-01
|
Series: | Frontiers in Physics |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/fphy.2024.1324719/full |
_version_ | 1797265498284490752 |
---|---|
author | Wei Zhang Haixin Cheng Haixin Cheng Siyu Zhan Siyu Zhan Ming Luo Feng Wang Zhan Huang |
author_facet | Wei Zhang Haixin Cheng Haixin Cheng Siyu Zhan Siyu Zhan Ming Luo Feng Wang Zhan Huang |
author_sort | Wei Zhang |
collection | DOAJ |
description | Software Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low prediction accuracy, poor robustness, and poor interpretability in SCE, this paper proposes a SCE model based on Autoencoder and Random Forest. First, preprocess the project data, remove outliers, and build regression trees to fill in missing attributes in the data. Second, construct a Autoencoder to reduce the dimensionality of factors that affect software cost. Subsequently, the performance of the model was trained and validated using the XGBoost framework on three datasets: COCOMO81, Albrecht, and Desharnais, and compared with common cost prediction models. The experimental results show that the MMRE, MdMRE, and PRED (0.25) values of the proposed model on the COCOMO81 dataset reached 0.21, 0.16, and 0.71, respectively. Compared with other models, the proposed model achieved significant improvements in accuracy and robustness. |
first_indexed | 2024-04-25T00:45:45Z |
format | Article |
id | doaj.art-eff7b121c5d9448bb7ca914c159da948 |
institution | Directory Open Access Journal |
issn | 2296-424X |
language | English |
last_indexed | 2024-04-25T00:45:45Z |
publishDate | 2024-03-01 |
publisher | Frontiers Media S.A. |
record_format | Article |
series | Frontiers in Physics |
spelling | doaj.art-eff7b121c5d9448bb7ca914c159da9482024-03-12T04:59:12ZengFrontiers Media S.A.Frontiers in Physics2296-424X2024-03-011210.3389/fphy.2024.13247191324719Dimensionality reduction and machine learning based model of software cost estimationWei Zhang0Haixin Cheng1Haixin Cheng2Siyu Zhan3Siyu Zhan4Ming Luo5Feng Wang6Zhan Huang7Research Institute of Natural Gas Gathering and Transmission Engineering Technology, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaLaboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, Chengdu, ChinaTrusted Cloud Computing and Big Data Key Laboratory of Sichuan Province, Chengdu, ChinaLaboratory of Intelligent Collaborative Computing, University of Electronic Science and Technology of China, Chengdu, ChinaTrusted Cloud Computing and Big Data Key Laboratory of Sichuan Province, Chengdu, ChinaCapital Construction Department, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaResearch Institute of Natural Gas Gathering and Transmission Engineering Technology, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaCapital Construction Department, PetroChina Southwest Oil and Gasfield Company, Chengdu, ChinaSoftware Cost Estimation (SCE) is one of the research priorities and challenges in the construction of cyber-physical-social systems (CPSSs). In CPSS, it is urge to process environmental and social information accurately and use it to guide social practice. Thus, in response to the problems of low prediction accuracy, poor robustness, and poor interpretability in SCE, this paper proposes a SCE model based on Autoencoder and Random Forest. First, preprocess the project data, remove outliers, and build regression trees to fill in missing attributes in the data. Second, construct a Autoencoder to reduce the dimensionality of factors that affect software cost. Subsequently, the performance of the model was trained and validated using the XGBoost framework on three datasets: COCOMO81, Albrecht, and Desharnais, and compared with common cost prediction models. The experimental results show that the MMRE, MdMRE, and PRED (0.25) values of the proposed model on the COCOMO81 dataset reached 0.21, 0.16, and 0.71, respectively. Compared with other models, the proposed model achieved significant improvements in accuracy and robustness.https://www.frontiersin.org/articles/10.3389/fphy.2024.1324719/fullsoftware cost estimationAutoencoderrandom forestCOCOMOdimensionality reduction |
spellingShingle | Wei Zhang Haixin Cheng Haixin Cheng Siyu Zhan Siyu Zhan Ming Luo Feng Wang Zhan Huang Dimensionality reduction and machine learning based model of software cost estimation Frontiers in Physics software cost estimation Autoencoder random forest COCOMO dimensionality reduction |
title | Dimensionality reduction and machine learning based model of software cost estimation |
title_full | Dimensionality reduction and machine learning based model of software cost estimation |
title_fullStr | Dimensionality reduction and machine learning based model of software cost estimation |
title_full_unstemmed | Dimensionality reduction and machine learning based model of software cost estimation |
title_short | Dimensionality reduction and machine learning based model of software cost estimation |
title_sort | dimensionality reduction and machine learning based model of software cost estimation |
topic | software cost estimation Autoencoder random forest COCOMO dimensionality reduction |
url | https://www.frontiersin.org/articles/10.3389/fphy.2024.1324719/full |
work_keys_str_mv | AT weizhang dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation AT haixincheng dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation AT haixincheng dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation AT siyuzhan dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation AT siyuzhan dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation AT mingluo dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation AT fengwang dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation AT zhanhuang dimensionalityreductionandmachinelearningbasedmodelofsoftwarecostestimation |