Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references

Abstract Introduction Spirometry is associated with several diagnostic difficulties, and as a result, misdiagnosis of chronic obstructive pulmonary disease (COPD) occurs. This study aims to investigate how random forest (RF) can be used to improve the existing clinical FVC and FEV1 reference values...

Full description

Bibliographic Details
Main Authors: Kris Kristensen, Pernille H. Olesen, Anna K. Roerbaek, Louise Nielsen, Helle K. Hansen, Simon L. Cichosz, Morten H. Jensen, Ole Hejlesen
Format: Article
Language:English
Published: Wiley 2023-08-01
Series:The Clinical Respiratory Journal
Subjects:
Online Access:https://doi.org/10.1111/crj.13662
_version_ 1797741412106633216
author Kris Kristensen
Pernille H. Olesen
Anna K. Roerbaek
Louise Nielsen
Helle K. Hansen
Simon L. Cichosz
Morten H. Jensen
Ole Hejlesen
author_facet Kris Kristensen
Pernille H. Olesen
Anna K. Roerbaek
Louise Nielsen
Helle K. Hansen
Simon L. Cichosz
Morten H. Jensen
Ole Hejlesen
author_sort Kris Kristensen
collection DOAJ
description Abstract Introduction Spirometry is associated with several diagnostic difficulties, and as a result, misdiagnosis of chronic obstructive pulmonary disease (COPD) occurs. This study aims to investigate how random forest (RF) can be used to improve the existing clinical FVC and FEV1 reference values in a large and representative cohort of the general population of the US without known lung disease. Materials and methods FVC, FEV1, body measures, and demographic data from 23 433 people were extracted from NHANES. RF was used to develop different prediction models. The accuracy of RF was compared with the existing Danish clinical references, an improved multiple linear regression (MLR) model, and a model from the literature. Results The correlation between actual and predicted FVC and FEV1 and the 95% confidence interval for RF were found to be FVC = 0.85 (0.85; 0.86) (p < 0.001), FEV1 = 0.92 (0.92; 0.93) (p < 0.001), and existing clinical references were FVC = 0.66 (0.64; 0.68) (p < 0.001) and FEV1 = 0.69 (0.67; 0.70) (p < 0.001). Slope and intercept for the RF models predicting FVC and FEV1 were FVC 1.06 and −238.04 (mL), FEV1: 0.86 and 455.36 (mL), and for the MLR models, slope and intercept were FVC: 0.99 and 38.56 39 (mL), and FEV1: 1.01 and −56.57‐57 (mL). Conclusions The results point toward machine learning models such as RF have the potential to improve the prediction of estimated lung function for individual patients. These predictions are used as reference values and are an important part of assessing spirometry measurements in clinical practice. Further work is necessary in order to reduce the size of the intercepts obtained through these results.
first_indexed 2024-03-12T14:26:19Z
format Article
id doaj.art-73c3ef2508f64b418b925a41d357c9fb
institution Directory Open Access Journal
issn 1752-6981
1752-699X
language English
last_indexed 2024-03-12T14:26:19Z
publishDate 2023-08-01
publisher Wiley
record_format Article
series The Clinical Respiratory Journal
spelling doaj.art-73c3ef2508f64b418b925a41d357c9fb2023-08-18T07:18:02ZengWileyThe Clinical Respiratory Journal1752-69811752-699X2023-08-0117881982810.1111/crj.13662Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry referencesKris Kristensen0Pernille H. Olesen1Anna K. Roerbaek2Louise Nielsen3Helle K. Hansen4Simon L. Cichosz5Morten H. Jensen6Ole Hejlesen7Department of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkDepartment of Health Science and Technology Aalborg University Aalborg DenmarkAbstract Introduction Spirometry is associated with several diagnostic difficulties, and as a result, misdiagnosis of chronic obstructive pulmonary disease (COPD) occurs. This study aims to investigate how random forest (RF) can be used to improve the existing clinical FVC and FEV1 reference values in a large and representative cohort of the general population of the US without known lung disease. Materials and methods FVC, FEV1, body measures, and demographic data from 23 433 people were extracted from NHANES. RF was used to develop different prediction models. The accuracy of RF was compared with the existing Danish clinical references, an improved multiple linear regression (MLR) model, and a model from the literature. Results The correlation between actual and predicted FVC and FEV1 and the 95% confidence interval for RF were found to be FVC = 0.85 (0.85; 0.86) (p < 0.001), FEV1 = 0.92 (0.92; 0.93) (p < 0.001), and existing clinical references were FVC = 0.66 (0.64; 0.68) (p < 0.001) and FEV1 = 0.69 (0.67; 0.70) (p < 0.001). Slope and intercept for the RF models predicting FVC and FEV1 were FVC 1.06 and −238.04 (mL), FEV1: 0.86 and 455.36 (mL), and for the MLR models, slope and intercept were FVC: 0.99 and 38.56 39 (mL), and FEV1: 1.01 and −56.57‐57 (mL). Conclusions The results point toward machine learning models such as RF have the potential to improve the prediction of estimated lung function for individual patients. These predictions are used as reference values and are an important part of assessing spirometry measurements in clinical practice. Further work is necessary in order to reduce the size of the intercepts obtained through these results.https://doi.org/10.1111/crj.13662clinical referencesCOPDmisdiagnosismultiple linear regressionrandom forestspirometry
spellingShingle Kris Kristensen
Pernille H. Olesen
Anna K. Roerbaek
Louise Nielsen
Helle K. Hansen
Simon L. Cichosz
Morten H. Jensen
Ole Hejlesen
Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
The Clinical Respiratory Journal
clinical references
COPD
misdiagnosis
multiple linear regression
random forest
spirometry
title Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
title_full Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
title_fullStr Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
title_full_unstemmed Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
title_short Using random forest machine learning on data from a large, representative cohort of the general population improves clinical spirometry references
title_sort using random forest machine learning on data from a large representative cohort of the general population improves clinical spirometry references
topic clinical references
COPD
misdiagnosis
multiple linear regression
random forest
spirometry
url https://doi.org/10.1111/crj.13662
work_keys_str_mv AT kriskristensen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences
AT pernilleholesen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences
AT annakroerbaek usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences
AT louisenielsen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences
AT hellekhansen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences
AT simonlcichosz usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences
AT mortenhjensen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences
AT olehejlesen usingrandomforestmachinelearningondatafromalargerepresentativecohortofthegeneralpopulationimprovesclinicalspirometryreferences