A Diabetes Prediction System Based on Incomplete Fused Data Sources
In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2023-04-01
|
Series: | Machine Learning and Knowledge Extraction |
Subjects: | |
Online Access: | https://www.mdpi.com/2504-4990/5/2/23 |
_version_ | 1797593742402650112 |
---|---|
author | Zhaoyi Yuan Hao Ding Guoqing Chao Mingqiang Song Lei Wang Weiping Ding Dianhui Chu |
author_facet | Zhaoyi Yuan Hao Ding Guoqing Chao Mingqiang Song Lei Wang Weiping Ding Dianhui Chu |
author_sort | Zhaoyi Yuan |
collection | DOAJ |
description | In recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely important to integrate these heterogeneous data sources to accurately predict diabetes. For the different data sources used to predict diabetes, the predictors may be different. In other words, some special features exist only in certain data sources, which leads to the problem of missing values. Considering the uncertainty of the missing values within the fused dataset, multiple imputation and a method based on graph representation is used to impute the missing values within the fused dataset. The logistic regression model and stacking strategy are applied for diabetes training and prediction on the fused dataset. It is proved that the idea of combining heterogeneous datasets and imputing the missing values produced in the fusion process can effectively improve the performance of diabetes prediction. In addition, the proposed diabetes prediction method can be further extended to any scenarios where heterogeneous datasets with the same label types and different feature attributes exist. |
first_indexed | 2024-03-11T02:13:48Z |
format | Article |
id | doaj.art-c62664876db94954af88745a8831b4a6 |
institution | Directory Open Access Journal |
issn | 2504-4990 |
language | English |
last_indexed | 2024-03-11T02:13:48Z |
publishDate | 2023-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Machine Learning and Knowledge Extraction |
spelling | doaj.art-c62664876db94954af88745a8831b4a62023-11-18T11:22:04ZengMDPI AGMachine Learning and Knowledge Extraction2504-49902023-04-015238439910.3390/make5020023A Diabetes Prediction System Based on Incomplete Fused Data SourcesZhaoyi Yuan0Hao Ding1Guoqing Chao2Mingqiang Song3Lei Wang4Weiping Ding5Dianhui Chu6School of Computer Sciences and Technology, Harbin Institute of Technology, Weihai 264209, ChinaSchool of Computer Sciences and Technology, Harbin Institute of Technology, Weihai 264209, ChinaSchool of Computer Sciences and Technology, Harbin Institute of Technology, Weihai 264209, ChinaDepartment of Endocrinology and Metabolism, Weihai Municipal Hospital, Affiliated to Shandong University, Weihai 264209, ChinaCAS Key Laboratory of Bio-Medical Diagnostics, Suzhou Institute of Biomedical Engineering and Technology Chinese Academy of Sciences, Suzhou 215163, ChinaSchool of Information Science and Technology, Nantong University, Nantong 226019, ChinaSchool of Computer Sciences and Technology, Harbin Institute of Technology, Weihai 264209, ChinaIn recent years, the diabetes population has grown younger. Therefore, it has become a key problem to make a timely and effective prediction of diabetes, especially given a single data source. Meanwhile, there are many data sources of diabetes patients collected around the world, and it is extremely important to integrate these heterogeneous data sources to accurately predict diabetes. For the different data sources used to predict diabetes, the predictors may be different. In other words, some special features exist only in certain data sources, which leads to the problem of missing values. Considering the uncertainty of the missing values within the fused dataset, multiple imputation and a method based on graph representation is used to impute the missing values within the fused dataset. The logistic regression model and stacking strategy are applied for diabetes training and prediction on the fused dataset. It is proved that the idea of combining heterogeneous datasets and imputing the missing values produced in the fusion process can effectively improve the performance of diabetes prediction. In addition, the proposed diabetes prediction method can be further extended to any scenarios where heterogeneous datasets with the same label types and different feature attributes exist.https://www.mdpi.com/2504-4990/5/2/23diabetes predictiondata sources fusionmissing values imputationgraph representation learningensemble learning |
spellingShingle | Zhaoyi Yuan Hao Ding Guoqing Chao Mingqiang Song Lei Wang Weiping Ding Dianhui Chu A Diabetes Prediction System Based on Incomplete Fused Data Sources Machine Learning and Knowledge Extraction diabetes prediction data sources fusion missing values imputation graph representation learning ensemble learning |
title | A Diabetes Prediction System Based on Incomplete Fused Data Sources |
title_full | A Diabetes Prediction System Based on Incomplete Fused Data Sources |
title_fullStr | A Diabetes Prediction System Based on Incomplete Fused Data Sources |
title_full_unstemmed | A Diabetes Prediction System Based on Incomplete Fused Data Sources |
title_short | A Diabetes Prediction System Based on Incomplete Fused Data Sources |
title_sort | diabetes prediction system based on incomplete fused data sources |
topic | diabetes prediction data sources fusion missing values imputation graph representation learning ensemble learning |
url | https://www.mdpi.com/2504-4990/5/2/23 |
work_keys_str_mv | AT zhaoyiyuan adiabetespredictionsystembasedonincompletefuseddatasources AT haoding adiabetespredictionsystembasedonincompletefuseddatasources AT guoqingchao adiabetespredictionsystembasedonincompletefuseddatasources AT mingqiangsong adiabetespredictionsystembasedonincompletefuseddatasources AT leiwang adiabetespredictionsystembasedonincompletefuseddatasources AT weipingding adiabetespredictionsystembasedonincompletefuseddatasources AT dianhuichu adiabetespredictionsystembasedonincompletefuseddatasources AT zhaoyiyuan diabetespredictionsystembasedonincompletefuseddatasources AT haoding diabetespredictionsystembasedonincompletefuseddatasources AT guoqingchao diabetespredictionsystembasedonincompletefuseddatasources AT mingqiangsong diabetespredictionsystembasedonincompletefuseddatasources AT leiwang diabetespredictionsystembasedonincompletefuseddatasources AT weipingding diabetespredictionsystembasedonincompletefuseddatasources AT dianhuichu diabetespredictionsystembasedonincompletefuseddatasources |