Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)

Diabetes results from impaired pancreatic function as a producer of insulin and glucagon hormones, which regulate glucose levels in the blood. People with diabetes today are not only experienced adults, but pre-diabetes has been identified since the age of children and adolescents. Early prediction...

Full description

Bibliographic Details
Main Authors: Kartina Diah Kusuma Wardani, Memen Akbar
Format: Article
Language:English
Published: Ikatan Ahli Informatika Indonesia 2023-08-01
Series:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
Subjects:
Online Access:http://jurnal.iaii.or.id/index.php/RESTI/article/view/4651
_version_ 1797326790583123968
author Kartina Diah Kusuma Wardani
Memen Akbar
author_facet Kartina Diah Kusuma Wardani
Memen Akbar
author_sort Kartina Diah Kusuma Wardani
collection DOAJ
description Diabetes results from impaired pancreatic function as a producer of insulin and glucagon hormones, which regulate glucose levels in the blood. People with diabetes today are not only experienced adults, but pre-diabetes has been identified since the age of children and adolescents. Early prediction of diabetes can make it easier for doctors and patients to intervene as soon as possible so that the risk of complications can be reduced. One of the uses of medical data from diabetes patients is to produce a model that medical personnel can use to predict and identify diabetes in patients. Various techniques are used to provide the earliest possible prediction of diabetes based on the symptoms experienced by diabetic patients, including the use of machine learning. People can use machine learning to generate models based on historical data from diabetic patients, and predictions are made with the model. In this study, extreme gradient boosting is the machine learning technique for predicting diabetes (xgboost) using XGBoost with importance features. The diabetes dataset used in this study comes from the early stage diabetes risk prediction dataset published by UCI Machine Learning, which has 520 records and 16 attributes. The diabetes prediction model using xgboost is displayed as a tree. The model precision result in this study was 98.71%, for the F1 score was 98.18%. The accuracy obtained based on the best 10 attributes using the importance of the XGBoost feature is 98.72%.
first_indexed 2024-03-08T06:29:15Z
format Article
id doaj.art-6040a958896b4a7fb110cc90d9159286
institution Directory Open Access Journal
issn 2580-0760
language English
last_indexed 2024-03-08T06:29:15Z
publishDate 2023-08-01
publisher Ikatan Ahli Informatika Indonesia
record_format Article
series Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
spelling doaj.art-6040a958896b4a7fb110cc90d91592862024-02-03T12:23:47ZengIkatan Ahli Informatika IndonesiaJurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)2580-07602023-08-017482483110.29207/resti.v7i4.46514651Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)Kartina Diah Kusuma Wardani0Memen Akbar1Politeknik Caltex RiauPoliteknik Caltex RiauDiabetes results from impaired pancreatic function as a producer of insulin and glucagon hormones, which regulate glucose levels in the blood. People with diabetes today are not only experienced adults, but pre-diabetes has been identified since the age of children and adolescents. Early prediction of diabetes can make it easier for doctors and patients to intervene as soon as possible so that the risk of complications can be reduced. One of the uses of medical data from diabetes patients is to produce a model that medical personnel can use to predict and identify diabetes in patients. Various techniques are used to provide the earliest possible prediction of diabetes based on the symptoms experienced by diabetic patients, including the use of machine learning. People can use machine learning to generate models based on historical data from diabetic patients, and predictions are made with the model. In this study, extreme gradient boosting is the machine learning technique for predicting diabetes (xgboost) using XGBoost with importance features. The diabetes dataset used in this study comes from the early stage diabetes risk prediction dataset published by UCI Machine Learning, which has 520 records and 16 attributes. The diabetes prediction model using xgboost is displayed as a tree. The model precision result in this study was 98.71%, for the F1 score was 98.18%. The accuracy obtained based on the best 10 attributes using the importance of the XGBoost feature is 98.72%.http://jurnal.iaii.or.id/index.php/RESTI/article/view/4651diabetespredictionmachine learningxgboost
spellingShingle Kartina Diah Kusuma Wardani
Memen Akbar
Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
diabetes
prediction
machine learning
xgboost
title Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
title_full Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
title_fullStr Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
title_full_unstemmed Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
title_short Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
title_sort diabetes risk prediction using feature importance extreme gradient boosting xgboost
topic diabetes
prediction
machine learning
xgboost
url http://jurnal.iaii.or.id/index.php/RESTI/article/view/4651
work_keys_str_mv AT kartinadiahkusumawardani diabetesriskpredictionusingfeatureimportanceextremegradientboostingxgboost
AT memenakbar diabetesriskpredictionusingfeatureimportanceextremegradientboostingxgboost