Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning

Chronic obstructive pulmonary disease (COPD) is a major public health concern, affecting estimated 164 million people worldwide. Early detection and intervention strategies are essential to reduce the burden of COPD, but current screening approaches are limited in their ability to accurately predict...

Full description

Bibliographic Details
Main Authors: Guanglei Liu, Jiani Hu, Jianzhe Yang, Jie Song
Format: Article
Language:English
Published: PeerJ Inc. 2024-02-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/16950.pdf
_version_ 1797295377610702848
author Guanglei Liu
Jiani Hu
Jianzhe Yang
Jie Song
author_facet Guanglei Liu
Jiani Hu
Jianzhe Yang
Jie Song
author_sort Guanglei Liu
collection DOAJ
description Chronic obstructive pulmonary disease (COPD) is a major public health concern, affecting estimated 164 million people worldwide. Early detection and intervention strategies are essential to reduce the burden of COPD, but current screening approaches are limited in their ability to accurately predict risk. Machine learning (ML) models offer promise for improved accuracy of COPD risk prediction by combining genetic and electronic medical record data. In this study, we developed and evaluated eight ML models for primary screening of COPD utilizing routine screening data, polygenic risk scores (PRS), additional clinical data, or a combination of all three. To assess our models, we conducted a retrospective analysis of approximately 329,396 patients in the UK Biobank database. Incorporating personal information and blood biochemical test results significantly improved the model’s accuracy for predicting COPD risk, achieving a best performance of 0.8505 AUC, a specificity of 0.8539 and a sensitivity of 0.7584. These results indicate that ML models can be effectively utilized for accurate prediction of COPD risk in individuals aged 20 to 50 years, providing a valuable tool for early detection and intervention.
first_indexed 2024-03-07T21:46:56Z
format Article
id doaj.art-e5e95c21a5d14eb8b7d6922d40058220
institution Directory Open Access Journal
issn 2167-8359
language English
last_indexed 2024-03-07T21:46:56Z
publishDate 2024-02-01
publisher PeerJ Inc.
record_format Article
series PeerJ
spelling doaj.art-e5e95c21a5d14eb8b7d6922d400582202024-02-25T15:05:17ZengPeerJ Inc.PeerJ2167-83592024-02-0112e1695010.7717/peerj.16950Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learningGuanglei Liu0Jiani Hu1Jianzhe Yang2Jie Song3School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, ChinaAilurus Biotechnology Ltd., Shenzhen, Guangdong, ChinaAilurus Biotechnology Ltd., Shenzhen, Guangdong, ChinaAilurus Biotechnology Ltd., Shenzhen, Guangdong, ChinaChronic obstructive pulmonary disease (COPD) is a major public health concern, affecting estimated 164 million people worldwide. Early detection and intervention strategies are essential to reduce the burden of COPD, but current screening approaches are limited in their ability to accurately predict risk. Machine learning (ML) models offer promise for improved accuracy of COPD risk prediction by combining genetic and electronic medical record data. In this study, we developed and evaluated eight ML models for primary screening of COPD utilizing routine screening data, polygenic risk scores (PRS), additional clinical data, or a combination of all three. To assess our models, we conducted a retrospective analysis of approximately 329,396 patients in the UK Biobank database. Incorporating personal information and blood biochemical test results significantly improved the model’s accuracy for predicting COPD risk, achieving a best performance of 0.8505 AUC, a specificity of 0.8539 and a sensitivity of 0.7584. These results indicate that ML models can be effectively utilized for accurate prediction of COPD risk in individuals aged 20 to 50 years, providing a valuable tool for early detection and intervention.https://peerj.com/articles/16950.pdfChronic obstructive pulmonary diseaseCOPDMachine learningRisk predictionGenetic dataElectronic health records
spellingShingle Guanglei Liu
Jiani Hu
Jianzhe Yang
Jie Song
Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning
PeerJ
Chronic obstructive pulmonary disease
COPD
Machine learning
Risk prediction
Genetic data
Electronic health records
title Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning
title_full Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning
title_fullStr Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning
title_full_unstemmed Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning
title_short Predicting early-onset COPD risk in adults aged 20–50 using electronic health records and machine learning
title_sort predicting early onset copd risk in adults aged 20 50 using electronic health records and machine learning
topic Chronic obstructive pulmonary disease
COPD
Machine learning
Risk prediction
Genetic data
Electronic health records
url https://peerj.com/articles/16950.pdf
work_keys_str_mv AT guangleiliu predictingearlyonsetcopdriskinadultsaged2050usingelectronichealthrecordsandmachinelearning
AT jianihu predictingearlyonsetcopdriskinadultsaged2050usingelectronichealthrecordsandmachinelearning
AT jianzheyang predictingearlyonsetcopdriskinadultsaged2050usingelectronichealthrecordsandmachinelearning
AT jiesong predictingearlyonsetcopdriskinadultsaged2050usingelectronichealthrecordsandmachinelearning