Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetes

Abstract Background Being able to predict with confidence the early onset of type 2 diabetes from a suite of signs and symptoms (features) displayed by potential sufferers is desirable to commence treatment promptly. Late or inconclusive diagnosis can result in more serious health consequences for s...

Full description

Bibliographic Details
Main Author: David A. Wood
Format: Article
Language:English
Published: Wiley 2022-12-01
Series:Chronic Diseases and Translational Medicine
Subjects:
Online Access:https://doi.org/10.1002/cdt3.39
_version_ 1797733422171422720
author David A. Wood
author_facet David A. Wood
author_sort David A. Wood
collection DOAJ
description Abstract Background Being able to predict with confidence the early onset of type 2 diabetes from a suite of signs and symptoms (features) displayed by potential sufferers is desirable to commence treatment promptly. Late or inconclusive diagnosis can result in more serious health consequences for sufferers and higher costs for health care services in the long run. Methods A novel integrated methodology is proposed involving correlation, statistical analysis, machine learning, multi‐K‐fold cross‐validation, and confusion matrices to provide a reliable classification of diabetes‐positive and ‐negative individuals from a substantial suite of features. The method also identifies the relative influence of each feature on the diabetes diagnosis and highlights the most important ones. Ten statistical and machine learning methods are utilized to conduct the analysis. Results A published data set involving 520 individuals (Sylthet Diabetes Hospital, Bangladesh) is modeled revealing that a support vector classifier generates the most accurate early‐onset type 2 diabetes status predictions with just 11 misclassifications (2.1% error). Polydipsia and polyuria are among the most influential features, whereas obesity and age are assigned low weights by the prediction models. Conclusion The proposed methodology can rapidly predict early‐onset type 2 diabetes with high confidence while providing valuable insight into the key influential features involved in such predictions.
first_indexed 2024-03-12T12:28:51Z
format Article
id doaj.art-9294d69bf9c14640acb37c83dc8aa40d
institution Directory Open Access Journal
issn 2589-0514
language English
last_indexed 2024-03-12T12:28:51Z
publishDate 2022-12-01
publisher Wiley
record_format Article
series Chronic Diseases and Translational Medicine
spelling doaj.art-9294d69bf9c14640acb37c83dc8aa40d2023-08-29T18:05:52ZengWileyChronic Diseases and Translational Medicine2589-05142022-12-018428129510.1002/cdt3.39Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetesDavid A. Wood0DWA Energy Limited Lincoln UKAbstract Background Being able to predict with confidence the early onset of type 2 diabetes from a suite of signs and symptoms (features) displayed by potential sufferers is desirable to commence treatment promptly. Late or inconclusive diagnosis can result in more serious health consequences for sufferers and higher costs for health care services in the long run. Methods A novel integrated methodology is proposed involving correlation, statistical analysis, machine learning, multi‐K‐fold cross‐validation, and confusion matrices to provide a reliable classification of diabetes‐positive and ‐negative individuals from a substantial suite of features. The method also identifies the relative influence of each feature on the diabetes diagnosis and highlights the most important ones. Ten statistical and machine learning methods are utilized to conduct the analysis. Results A published data set involving 520 individuals (Sylthet Diabetes Hospital, Bangladesh) is modeled revealing that a support vector classifier generates the most accurate early‐onset type 2 diabetes status predictions with just 11 misclassifications (2.1% error). Polydipsia and polyuria are among the most influential features, whereas obesity and age are assigned low weights by the prediction models. Conclusion The proposed methodology can rapidly predict early‐onset type 2 diabetes with high confidence while providing valuable insight into the key influential features involved in such predictions.https://doi.org/10.1002/cdt3.39error analysiskey feature influencesmulti‐K‐fold cross‐validationsymptom importancetype 2 diabetes screening
spellingShingle David A. Wood
Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetes
Chronic Diseases and Translational Medicine
error analysis
key feature influences
multi‐K‐fold cross‐validation
symptom importance
type 2 diabetes screening
title Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetes
title_full Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetes
title_fullStr Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetes
title_full_unstemmed Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetes
title_short Integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early‐onset type 2 diabetes
title_sort integrated statistical and machine learning analysis provides insight into key influencing symptoms for distinguishing early onset type 2 diabetes
topic error analysis
key feature influences
multi‐K‐fold cross‐validation
symptom importance
type 2 diabetes screening
url https://doi.org/10.1002/cdt3.39
work_keys_str_mv AT davidawood integratedstatisticalandmachinelearninganalysisprovidesinsightintokeyinfluencingsymptomsfordistinguishingearlyonsettype2diabetes