Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean Women

As osteoporosis is a degenerative disease related to postmenopausal aging, early diagnosis is vital. This study used data from the Korea National Health and Nutrition Examination Surveys to predict a patient’s risk of osteoporosis using machine learning algorithms. Data from 1431 postmenopausal wome...

Full description

Bibliographic Details
Main Authors: Youngihn Kwon, Juyeon Lee, Joo Hee Park, Yoo Mee Kim, Se Hwa Kim, Young Jun Won, Hyung-Yong Kim
Format: Article
Language:English
Published: MDPI AG 2022-06-01
Series:Healthcare
Subjects:
Online Access:https://www.mdpi.com/2227-9032/10/6/1107
_version_ 1797486885645320192
author Youngihn Kwon
Juyeon Lee
Joo Hee Park
Yoo Mee Kim
Se Hwa Kim
Young Jun Won
Hyung-Yong Kim
author_facet Youngihn Kwon
Juyeon Lee
Joo Hee Park
Yoo Mee Kim
Se Hwa Kim
Young Jun Won
Hyung-Yong Kim
author_sort Youngihn Kwon
collection DOAJ
description As osteoporosis is a degenerative disease related to postmenopausal aging, early diagnosis is vital. This study used data from the Korea National Health and Nutrition Examination Surveys to predict a patient’s risk of osteoporosis using machine learning algorithms. Data from 1431 postmenopausal women aged 40–69 years were used, including 20 features affecting osteoporosis, chosen by feature importance and recursive feature elimination. Random Forest (RF), AdaBoost, and Gradient Boosting (GBM) machine learning algorithms were each used to train three models: A, checkup features; B, survey features; and C, both checkup and survey features, respectively. Of the three models, Model C generated the best outcomes with an accuracy of 0.832 for RF, 0.849 for AdaBoost, and 0.829 for GBM. Its area under the receiver operating characteristic curve (AUROC) was 0.919 for RF, 0.921 for AdaBoost, and 0.908 for GBM. By utilizing multiple feature selection methods, the ensemble models of this study achieved excellent results with an AUROC score of 0.921 with AdaBoost, which is 0.1–0.2 higher than those of the best performing models from recent studies. Our model can be further improved as a practical medical tool for the early diagnosis of osteoporosis after menopause.
first_indexed 2024-03-09T23:40:40Z
format Article
id doaj.art-30522079ba744360bcb6c97a131259bf
institution Directory Open Access Journal
issn 2227-9032
language English
last_indexed 2024-03-09T23:40:40Z
publishDate 2022-06-01
publisher MDPI AG
record_format Article
series Healthcare
spelling doaj.art-30522079ba744360bcb6c97a131259bf2023-11-23T16:53:01ZengMDPI AGHealthcare2227-90322022-06-01106110710.3390/healthcare10061107Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean WomenYoungihn Kwon0Juyeon Lee1Joo Hee Park2Yoo Mee Kim3Se Hwa Kim4Young Jun Won5Hyung-Yong Kim6Insilicogen, Inc., Yongin-si 16954, KoreaAIDX, Inc., Yongin-si 16954, KoreaAIDX, Inc., Yongin-si 16954, KoreaDepartment of Internal Medicine, International St. Mary’s Hospital, Catholic Kwandong University College of Medicine, Incheon 22711, KoreaDepartment of Internal Medicine, International St. Mary’s Hospital, Catholic Kwandong University College of Medicine, Incheon 22711, KoreaDepartment of Internal Medicine, International St. Mary’s Hospital, Catholic Kwandong University College of Medicine, Incheon 22711, KoreaAIDX, Inc., Yongin-si 16954, KoreaAs osteoporosis is a degenerative disease related to postmenopausal aging, early diagnosis is vital. This study used data from the Korea National Health and Nutrition Examination Surveys to predict a patient’s risk of osteoporosis using machine learning algorithms. Data from 1431 postmenopausal women aged 40–69 years were used, including 20 features affecting osteoporosis, chosen by feature importance and recursive feature elimination. Random Forest (RF), AdaBoost, and Gradient Boosting (GBM) machine learning algorithms were each used to train three models: A, checkup features; B, survey features; and C, both checkup and survey features, respectively. Of the three models, Model C generated the best outcomes with an accuracy of 0.832 for RF, 0.849 for AdaBoost, and 0.829 for GBM. Its area under the receiver operating characteristic curve (AUROC) was 0.919 for RF, 0.921 for AdaBoost, and 0.908 for GBM. By utilizing multiple feature selection methods, the ensemble models of this study achieved excellent results with an AUROC score of 0.921 with AdaBoost, which is 0.1–0.2 higher than those of the best performing models from recent studies. Our model can be further improved as a practical medical tool for the early diagnosis of osteoporosis after menopause.https://www.mdpi.com/2227-9032/10/6/1107machine learningfeature selectionosteoporosispostmenopausal womenpre-screeningrisk assessment
spellingShingle Youngihn Kwon
Juyeon Lee
Joo Hee Park
Yoo Mee Kim
Se Hwa Kim
Young Jun Won
Hyung-Yong Kim
Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean Women
Healthcare
machine learning
feature selection
osteoporosis
postmenopausal women
pre-screening
risk assessment
title Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean Women
title_full Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean Women
title_fullStr Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean Women
title_full_unstemmed Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean Women
title_short Osteoporosis Pre-Screening Using Ensemble Machine Learning in Postmenopausal Korean Women
title_sort osteoporosis pre screening using ensemble machine learning in postmenopausal korean women
topic machine learning
feature selection
osteoporosis
postmenopausal women
pre-screening
risk assessment
url https://www.mdpi.com/2227-9032/10/6/1107
work_keys_str_mv AT youngihnkwon osteoporosisprescreeningusingensemblemachinelearninginpostmenopausalkoreanwomen
AT juyeonlee osteoporosisprescreeningusingensemblemachinelearninginpostmenopausalkoreanwomen
AT jooheepark osteoporosisprescreeningusingensemblemachinelearninginpostmenopausalkoreanwomen
AT yoomeekim osteoporosisprescreeningusingensemblemachinelearninginpostmenopausalkoreanwomen
AT sehwakim osteoporosisprescreeningusingensemblemachinelearninginpostmenopausalkoreanwomen
AT youngjunwon osteoporosisprescreeningusingensemblemachinelearninginpostmenopausalkoreanwomen
AT hyungyongkim osteoporosisprescreeningusingensemblemachinelearninginpostmenopausalkoreanwomen