Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
Abstract Introduction Spirometry is associated with several diagnostic difficulties, and as a result, misdiagnosis of chronic obstructive pulmonary disease (COPD) occurs. This study aims to investigate how random forest (RF) can be used to improve the existing clinical FVC and FEV1 reference values...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Wiley
2023-08-01
|
Series: | The Clinical Respiratory Journal |
Subjects: | |
Online Access: | https://doi.org/10.1111/crj.13662 |
_version_ | 1797741412106633216 |
---|---|
author | Kris Kristensen Pernille H. Olesen Anna K. Roerbaek Louise Nielsen Helle K. Hansen Simon L. Cichosz Morten H. Jensen Ole Hejlesen |
author_facet | Kris Kristensen Pernille H. Olesen Anna K. Roerbaek Louise Nielsen Helle K. Hansen Simon L. Cichosz Morten H. Jensen Ole Hejlesen |
author_sort | Kris Kristensen |
collection | DOAJ |
description | Abstract Introduction Spirometry is associated with several diagnostic difficulties, and as a result, misdiagnosis of chronic obstructive pulmonary disease (COPD) occurs. This study aims to investigate how random forest (RF) can be used to improve the existing clinical FVC and FEV1 reference values in a large and representative cohort of the general population of the US without known lung disease. Materials and methods FVC, FEV1, body measures, and demographic data from 23 433 people were extracted from NHANES. RF was used to develop different prediction models. The accuracy of RF was compared with the existing Danish clinical references, an improved multiple linear regression (MLR) model, and a model from the literature. Results The correlation between actual and predicted FVC and FEV1 and the 95% confidence interval for RF were found to be FVC = 0.85 (0.85; 0.86) (p < 0.001), FEV1 = 0.92 (0.92; 0.93) (p < 0.001), and existing clinical references were FVC = 0.66 (0.64; 0.68) (p < 0.001) and FEV1 = 0.69 (0.67; 0.70) (p < 0.001). Slope and intercept for the RF models predicting FVC and FEV1 were FVC 1.06 and −238.04 (mL), FEV1: 0.86 and 455.36 (mL), and for the MLR models, slope and intercept were FVC: 0.99 and 38.56 39 (mL), and FEV1: 1.01 and −56.57‐57 (mL). Conclusions The results point toward machine learning models such as RF have the potential to improve the prediction of estimated lung function for individual patients. These predictions are used as reference values and are an important part of assessing spirometry measurements in clinical practice. Further work is necessary in order to reduce the size of the intercepts obtained through these results. |
first_indexed | 2024-03-12T14:26:19Z |
format | Article |
id | doaj.art-73c3ef2508f64b418b925a41d357c9fb |
institution | Directory Open Access Journal |
issn | 1752-6981 1752-699X |
language | English |
last_indexed | 2024-03-12T14:26:19Z |
publishDate | 2023-08-01 |
publisher | Wiley |
record_format | Article |
series | The Clinical Respiratory Journal |
spelling | doaj.art-73c3ef2508f64b418b925a41d357c9fb2023-08-18T07:18:02ZengWileyThe Clinical Respiratory Journal1752-69811752-699X2023-08-0117881982810.1111/crj.13662Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry referencesKris Kristensen0Pernille H. Olesen1Anna K. Roerbaek2Louise Nielsen3Helle K. Hansen4Simon L. Cichosz5Morten H. Jensen6Ole Hejlesen7Department of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkAbstract Introduction Spirometry is associated with several diagnostic difficulties, and as a result, misdiagnosis of chronic obstructive pulmonary disease (COPD) occurs. This study aims to investigate how random forest (RF) can be used to improve the existing clinical FVC and FEV1 reference values in a large and representative cohort of the general population of the US without known lung disease. Materials and methods FVC, FEV1, body measures, and demographic data from 23 433 people were extracted from NHANES. RF was used to develop different prediction models. The accuracy of RF was compared with the existing Danish clinical references, an improved multiple linear regression (MLR) model, and a model from the literature. Results The correlation between actual and predicted FVC and FEV1 and the 95% confidence interval for RF were found to be FVC = 0.85 (0.85; 0.86) (p < 0.001), FEV1 = 0.92 (0.92; 0.93) (p < 0.001), and existing clinical references were FVC = 0.66 (0.64; 0.68) (p < 0.001) and FEV1 = 0.69 (0.67; 0.70) (p < 0.001). Slope and intercept for the RF models predicting FVC and FEV1 were FVC 1.06 and −238.04 (mL), FEV1: 0.86 and 455.36 (mL), and for the MLR models, slope and intercept were FVC: 0.99 and 38.56 39 (mL), and FEV1: 1.01 and −56.57‐57 (mL). Conclusions The results point toward machine learning models such as RF have the potential to improve the prediction of estimated lung function for individual patients. These predictions are used as reference values and are an important part of assessing spirometry measurements in clinical practice. Further work is necessary in order to reduce the size of the intercepts obtained through these results.https://doi.org/10.1111/crj.13662clinical referencesCOPDmisdiagnosismultiple linear regressionrandom forestspirometry |
spellingShingle | Kris Kristensen Pernille H. Olesen Anna K. Roerbaek Louise Nielsen Helle K. Hansen Simon L. Cichosz Morten H. Jensen Ole Hejlesen Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references The Clinical Respiratory Journal clinical references COPD misdiagnosis multiple linear regression random forest spirometry |
title | Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references |
title_full | Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references |
title_fullStr | Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references |
title_full_unstemmed | Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references |
title_short | Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references |
title_sort | using random forest machine learning on data from a large representative cohort of the general population improves clinical spirometry references |
topic | clinical references COPD misdiagnosis multiple linear regression random forest spirometry |
url | https://doi.org/10.1111/crj.13662 |
work_keys_str_mv | AT kriskristensen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences AT pernilleholesen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences AT annakroerbaek usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences AT louisenielsen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences AT hellekhansen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences AT simonlcichosz usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences AT mortenhjensen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences AT olehejlesen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences |